ad-A0*»2  334  COURSEWARE  INC  SAN  OIESO  CALIF  F/6  S/9 

VALIDATION  OF  THE  INSTRUCTIONAL  STRATEGY  DIAGNOSTIC  PROFILE  (IS— ETCfU) 
APR  77  M 0 MERRILL*  N D WOOD  N00123-76-C-04Sa 

UNCLASSIFIED  NPRDC-TR-77-28  NL 


n 

1 

1 

r ■ 

< - 

& * • 
«r 

1 

- 

I' 

^ - 

1 

- ' 

END 

DATE 

FILMED 

8-77 

ooc 

42  334 


NPRDC  TR  77-25 


April  1977 


VALIDATION  OF  THE  INSTRUCTIONAL  STRATEGY  DIAGNOSTIC 
PROFILE  (ISDP):  EMPIRICAL  STUDIES 


M.  David  Merrill 
Norman  D.  Wood 


Courseware,  Inc. 

San  Diego,  California  92131 


Reviewed  by 
John  D.  Ford,  Jr. 


Approved  by 
James  J.  Regan 
Technical  Director 


— nr 


Prepared  for 

Navy  Personnel  Research  and  Development  Center 
San  Diego,  California  92152 


$CCumTY^£^yi^ltAf)bw  py  this  ^ACK  P1«  Kntmr»d) 

REPORT  DOCUMENTATION  PAGE  I beforeVompletSo*^^^ 

't.  PFPngVWuuBm  |2.  COVT  ACCESSION  NO.  J RECIPIENT’S  CAT ALOG  NUMBER 


NPRDC|/TR>77-  25 

TITLE  V«nd  SullKI*; 


yALIDATION  OF  THE  INSTRUCTIONAL  STRATEGY  \ 

t)IAGNOSTIC  PROFILE  (ISDP) : EMPIRICAL  STUDIES.  \ 


1/7  Final 


Xepar^.i 


- X.  AUTKgjgU 

M.  David /Merrill  / 

^ Norman  p.  A/ood  / 

" PERf6HMiNC  organization  namc'xno  address 
Courseware,  Incorporated 
San  Diego,  California  92131 


rfv-e^TRACT  OR  grant  numberc*; 


NOO123-76-C-0'A58/  / 


10.  PROCAAM  ELEMENT.  PROJECT.  TASK 
AREA  A WORK  UNIT  NUMBERS 


63^20N 

JL)  ^~108j,3OA 


|l  1-  CONTROLLING  OFFICE  NAME  AND  ADDRESS 


Navy  Personnel  Research  and  Development  Cente^ 
San  Diego,  California  92152 


Aprft  >977 


MONITORING  AGENCY  NAME  S ADDRESSfl/  fre«i  Controlling  OIHe0)  I IS.  SECURITY  CLASS,  (ot  thl9  fpert) 


UNCLASSIFIED 

ISa.  DECLASSIFICATION/ DOWNGRADING 
SCHEDULE 


I l«.  DISTRIBUTION  STATEMENT  (cl  (Ala  Raporl) 


Approved  for  public  release;  distribution  unlimited 


I 17.  distribution  STATEMENT  fol  IA«  aStIrpef  mtf%4  In  Block  SO,  U 4lll9rmt  from  R«porf; 


IS  supplementary  NOTES 


IS  KEY  WORDS  fConffmi*  «i  cltf*  II  and  Idantity  6r  Sloe* 

Instructional  Strategies  Instructional  Strategies  Diagnostic 

Statistics  Profile  (ISDP) 

Feedback  Instructional  Diagnosis 

Advanced  Organizers  Instructional  Prescription 


SO.  ABSTRACT  (Cantinua  an  raaaraa  alda  It  naaaaaarf  and  Idantlty  Sr  Sloe*  HuwSer} 

•J  ^ Three  experimental  studies  were  conducted  in  real-world  settings  In  an 
attempt  to  validate  the  instructional  Strategy  Diagnostic  Profile'^and  the 
accompanying  design  prescriptions. 

Two  different  methodologies  were  used.  ^In;,method  o»eiH.existlng  Instruc- 
tional materials  were  modified  on  the  basls|^.a  selected  prescription  that 
resulted  from  an  ISDP  analysis  of  those  materials.  Two  or  more  versions  of"^ 
the  materials  were  compared  In  an  experimental  comparison.  Method  two 

DO  , 1473  or  . MOV .. ..  o.K,L.T.  /, , mjcLASSIFIED  ' ' _ 

SSCURITV  CUAltIPICATlON  OR  THIS  MAOt  fSh«n  OM*  BnlarstfJ 


7/6 


UNCl.ASSlFItiD 

SCCU WITV  CL»ttlFIC»TIOM  Or  TMII  PAOmOHMm  Ott 


'consisted  of  course  Intervention  In  which  a weak  unit  of  an  existing  course 
was  Identified  and  modified  via  several  prescriptions  resulting  from -an - 
ISDP  analysis.  Test  performance,  affect,  confidence,  and  time  were  compared 
for  students  using  the  revised  materials  and  students  using  the  original 
materials. 

^When  used  to  revise  existing  materials,  the  ISDP  prescriptions  produced 
significant  differences  only  in  the  2nd  study.  Failure  to  find  the  predicted 
results  may  have  been  a result  of  confounding  factors  in  the  real-world 
experimental  situations  used.  Other  studies  have  demonstrated  that  existing 
materials  revised  according  to  ISDP  prescriptions  can  be  demonstrated  to 
produce  significant  increases  in  student  performance  especially  if  the 
Interaction  with  the  materials  can  be  controlled  and  the  tests  can  be 
revised  to  more  adequately  measure  concept  classification  and  rule-using 
behavior . 


UNCLASSIFIED 

MCURITV  CLAtMPICATIOM  OR  THIS  RAOirWMii  OaM 


FOREWORD 


This  research  and  development  was  conducted  in  support  of  Advanced 
Development  Subproject  Z0108.30A  (Adaptive  Experimental  Approach  to  In- 
structional Design).  This  work  is  one  aspect  of  an  area  concerned  with 
evaluation  of  Instruction/training.  Previous  work  reviewed  the  existing 
research  literature  that  has  Investigated  the  propositions  underlying  the 
Profile.  The  review  identified  requirements  for  research  and  development 
to  which  the  studies  described  in  the  present  report,  which  Involved  test 
and  evaluation  of  the  ISDP,  were  directed. 

Drs.  John  Carter  and  John  Ellis  served  as  contract  monitors.  This 
report  was  reviewed  and  edited  by  Dr.  John  Ellis. 


J.  J.  CLARKIN 
Commanding  Officer 


SUMMARY 


Problem 


The  Instructional  Strategy  Diagnostic  Profile  (ISDP)  was  designed  to 
enable  Instructional  developers  and  evaluators  to  predict  the  effectiveness 
of  and  prescribe  Improvements  for  existing  Instructional  materials.  Vlhlle 
some  evidence  exists  as  to  its  effectiveness,  It  has  not  yet  received  suf- 
ficient empirical  evaluation. 

Ob.lectlve 

The  purpose  of  this  research  and  development  effort  was  to  further  the 
development  and  validation  of  the  ISDP  and  the  accompanying  Instructional 
design  prescriptions. 

Approach 

Three  empirical  studies  were  conducted  using  two  different  methodologies. 
Method  one,  which  was  used  for  the  first  two  studies,  consists  of  modifying 
existing  instructional  materials  based  on  prescriptions  resulting  from  an 
ISDP  analysis  of  those  materials.  Method  two,  which  was  used  for  the  third 
study.  Is  an  Intervention  process.  In  which  a weak  unit  of  an  existing  course 
is  selected  and  modified  via  several  prescriptions  resulting  from  an  ISDP 
analysis.  Test  performance,  affect,  confidence,  and  time  were  compared  for 
students  using  the  revised  materials  and  for  those  using  the  original  materials. 

Study  1,  using  method  one  on  an  introductory  statistics  course,  compared 
a framework  presentation  with  a regular  prose  presentation  with  either  elaborated 
or  correct  answer  feedback. 


Study  2,  using  method  one  on  the  same  statistics  course  as  Study  1,  com- 
pared the  performance  of  groups  using  connected  rules,  discrete  rules,  and 
rules  embedded  In  expository  text.  The  connected  rule  Involved  an  algorithm 
for  selecting  which  rule  to  use;  the  discrete  rule  contained  an  algorithm; 
and  the  embedded  rule  neither  Involved  or  contained  an  algorithm. 


Study  3,  using  method  two,  revised  a unit  of  the  syllabus  for  an  introductory 
physics  course  and  then  compared  that  unit  with  the  original  unit  In  terms  of 
student  performance,  time,  and  affect.  Lectures,  textbooks,  and  discussions 
were  the  same  for  both  groups. 

Findings 


In  Study  1,  no  significant  differences  were  observed  In  student  performance, 
affect,  confidence,  or  time.  In  Study  2,  posttest  performance  of  both  discrete 
and  connected  rule  groups  was  superior  to  that  of  the  embedded  rule  group.  On 
affect,  the  discrete  rule  was  most  positive,  followed  by  the  connected  and 
embedded  rule  groups.  There  were  no  time  or  confidence  differences.  Finally, 
in  Study  3,  there  were  no  significant  differences  In  performance,  time,  or 
affect  between  the  two  groups,  although  the  means  were  In  the  predicted  direction. 
Failure  to  find  the  predicted  results  may  have  been  a result  of  confounding 
factors  In  the  real-world  experimental  situations  used. 


vll 


f . PFECEDim  pa3£  bunk-not  Fruta 


It  tl  ■ 


1 


Conclusions 


The  research  reviewed  Indicate  that  the  propositions  underlying  the 
Profile  seem  to  be  valid.  While  the  data  reported  In  this  document  Is 
somewhat  Inconclusive  and  not  sufficient  to  make  unqualified  statements 
It  Is,  nevertheless,  positive.  When  considered  with  other  data  on  the  ISDP, 
It  seems  reasonable  to  assume  that,  when  the  ISDP  Profile  Is  used  as  a guide 
to  analyze  and  modify  existing  instruction,  the  resulting  performance  of 
students  is  likely  to  be  more  effective.  This  is  especially  likely  when  the 
tests  as  well  as  the  main  line  instruction  can  be  modified.  It  Is  less 
likely  when  only  the  student  syllabus  is  modified.  The  ISDP  does  seem  to 
have  considerable  potential  as  an  Instructional  evaluation  and  development 
tool. 

Recommendations 


1.  The  ISDP,  as  presented  in  the  ISDP  training  manual.  Is  recommended 
for  use  by  Navy  instructional  developers  and  evaluators.  However,  it  should 
be  considered  as  an  experimental  tool  and  should  be  used  only  by  experienced 
Instructional  technologists  who  can  appropriately  adapt  its  use  to  various 
settings  and  circumstances. 

2.  It  is  recommended  that  ISDP  validation  and  development  efforts  con- 
tinue so  that  this  Instrument  can  become  an  easy  to  use  tool  for  all  instruc- 
tional development  and  evaluation  personnel. 


vili 


CONTENTS 


Page 

INTRODUCTION  1 

Problem  1 

Purpose  I 

Background  and  Scope  1 

APPROACH  3 

Study  1 — Framework  Rule  Representation  and  Elaborated  Feedback  in 

Statistics  Instruction  3 

Study  2 — Test  and  Generality  Consistency  in  a Classification  Task  . . 3 

Study  3 — Validation  of  the  Instructional  Strategy  Diagnostic  Profile 

in  Physics  100 4 

STUDY  1:  FRAMEWORK  RULE  REPRESENTATION  AND  ELABORATED  FEEDBACK 

IN  STATISTICS  INSTRUCTION  5 

Design  Challenges  5 

Rule  Representation  5 

Appropriate  Practice  . . .....  5 

Hypotheses  6 

Methods  6 

Selection  of  Subject  Matter  6 

Subjects  6 

Treatment  Materials  . . 7 

Instrumentation  7 

Design 22 

Results 22 

Discussion 24 

STUDY  2:  TEST  AND  GENERALITY  CONSISTENCY  IN  A STATISTICS  CLASSIFICATION 

TASK 25 

Problem 25 

Hypothesis 26 

Method 27 

Subject  Matter  Content  27 

Subjects 27 

Treatments 27 

Instrumentation  .....  30 

Procedure 33 

Design 33 


lx 


Pilj'.l' 


3A 


esults  . • • ' 

Hypothesis  1 
Hypothesis  z 

Hypothesis  5 
Hypothesis  ^ 


“iN’t-SsICS  


a»d  llypothes..  . • • 

ethods  

SuOlect  Coated  ^ • • 

Subjects  • • _ ^ . 

Treatments  ^ ^ , 

procedure  ^ _ 

Design  • 

, • • • • 

Results  • • • ■ 

Hypothesis  1 • • ‘ . 

Hypothesis  2 • • ^ ^ _ 

Hypothesis  i 


. . • • 


• • * • 

(iscusslon  • • 



COMMENDATIONS • • 

'Vf"-"®® ’ ITB,  nPE  HI®  PEW0»»S0E 

PPV,™^  - 

>1ST\UDDTI0N  blST 


3A 

34 

35 
35 

35 


37 

37 

37 

37 

38 
38 

38 

39 

39 

39 

40 
40 

40 

43 

45 

47 


A-0 


X 


LIST  OF  TABLES 


Page 


1.  Means  and  Standard  Deviations  of  Dependent  Variables  for  Four 

Treatment  Groups  23 

2.  Summary  of  Univariate  F-Ratios  on  Four  Dependent  Variables  24 

3.  Means  and  Standard  Deviations  for  Dependent  Variables  by 

Treatment  Group  34 

4.  Mean  Percentage  Correct  and  Standard  Deviations  by  Treatment  Group  . 40 

5.  Mean  Percentage  of  Items  Answered  Correctly  for  Students  with 

Upgraded  and  Regular  Materials  41 


LIST  OF  FIGURES 


1.  Framework  rule  representation  with  elaborated  feedback  8 

2.  Framework  rule  representation  with  correct  answer  feedback  11 

3.  Nonframework  rule  representation  with  elaborated  feedback  14 

4.  Nonframework  rule  representation  with  correct  answer  feedback  ...  17 

5.  Dependent  variable  measures  Including  an  example  of  a rule-using 
test  item,  a confidence  scale,  a semantic  differential  affect 

scale,  and  a time  record  space 20 

6.  Connected-rule  algorithm  for  selection  of  hypothesis  test  28 

7.  A sample  practice  question 29 

8.  An  example  of  a discrete  rule  algorithm  for  selection  of 

hypothesis  test 31 

9.  A sample  of  the  basic  performance  task  and  confidence  rating  ....  32 


JlS99> 


INTRODUCTION 


Problem 


Guidelines  for  predicting  instructional  effectiveness,  if  they  exist 
at  all,  are  vague  at  best.  It  is  almost  impossible  to  look  at  an  instruc- 
tional product  and  predict  its  effectiveness  by  the  use  of  existing  guides. 

The  Instructional  Strategy  Diagnostic  Profile  (ISDP)  was  designed  (Merrill 
& Wood,  1975)  to  enable  Instrucc^oiial  developers  and  evaluators  to  predict 
the  effectiveness  of  and  prescribe  improvements  for  existing  instructional 
materials.  While  some  evidence  exists  as  to  its  effectiveness,  it  has  not 
yet  received  sufficient  empirical  validation. 

Purpose 

The  purpose  of  this  research  and  development  effort  was  to  further  the 
development  and  validation  of  the  ISDP  and  the  accompanying  instructional 
design  prescriptions. 

Background  and  Scope 

This  project  was  conducted  in  three  phases.  Phase  I consisted  of  an 
extensive  review  of  reported  research  studies  as  they  relate  to  the  pro- 
positions underlying  the  Instructional  Strategy  Diagnostic  Profile  (Merrill, 
Olson,  & Coldeway,  1976) . Results  of  this  review  indicate  that  there  is 
considerable  empirical  research  support  for  most  of  the  propositions  under- 
lying the  Profile.  It  was  further  suggested  that  an  instructional  package 
that  is  judged  to  have  a high  ISDP  index  should  provide  rather  effective 
instruction. 

This  document  is  the  technical  report  for  the  Phase  II  effort,  which 
consisted  of  three  empirical  studies  using  two  different  methodologies. 

Phase  III  Involved  the  preparation  and  validation  of  a manual  for  training 
users  in  ISDP  analysis.  This  manual  will  be  published  as  a separate  technical 
report  (Merrill,  Wood,  & Richards,  in  preparation).  Preliminary  validation 
indicates  that  experienced  instructional  developers,  who  have  already  had  some 
training  in  the  vocabulary  of  the  ISDP,  are  able  to  consistently  rate  existing 
instruction  and  to  prescribe  modifications  by  using  the  guidelines  provided 
by  the  ISDP  training  manual. 


APPROACH 


i 

I 


t 


I 


As  indicated  previously.  Phase  II  of  this  project  consisted  of  conducting 
three  empirical  studies  using  two  different  methodologies.  Method  one,  which 
was  used  for  the  first  two  studies,  consists  of  modifying  existing  instructional 
materials  based  on  prescriptions  resulting  from  an  ISDP  analysis  of  those 
materials.  Method  two,  which  was  used  for  the  third  study,  is  an  intervention 
process,  in  which  (1)  a course  is  analyzed  via  the  ISDP,  (2)  a weak  unit  of 
Instruction  is  selected,  (3)  the  test  used  for  that  unit  is  revised  such  that 
it  yields  a higher  ISDP  index,  and  (4)  the  unit  of  Insfuct ion  selected  is 
modified  so  that  the  strategy  used  yields  a higher  ISDP  index.  The  original 
and  revised  instructions  are  then  administered  to  randomly  assigned  groups, 
and  performance  on  both  is  compared.  The  three  studies  are  described  briefly 
below  and  in  detail  in  the  following  sections. 

Study  1 — Framework  Rule  Representation  and  Elaborated  Feedback  in  Statistics 
Instruction 


Using  method  one  in  an  introductory  statistics  course,  a framework 
rule  representation  was  compared  with  a regular  prose  rule  rep'- esentation 
and  correct  answer  feedback  was  compared  with  elaborated  feedback.  On  the 
posttest,  there  were  no  performance  differences.  Students  were  also  compared 
on  (1)  the  appeal  of  the  instruction,  as  measured  by  a questionnaire,  (2) 
confidence  in  their  responses,  and  (3)  time  required  to  complete  the  instruc- 
tion. There  were  no  significant  differences  on  any  of  these  dependent  measures. 
The  ISDP  rating  of  the  instruction  before  the  modification  was  very  good.  It 
was  suggested  that  the  relationship  between  performance  and  the  ISDP  index  is 
a decreasing  function.  That  is,  it  requires  a large  Increment  at  the  high 
end  of  the  ISDP  scale  to  result  in  a measurable  performance  difference,  as 
compared  to  a small  increment  at  the  low  end  of  the  scale. 

Study  2 — Test  and  Generality  Consistency  in  a Classification  Task 

Also  using  method  one  in  the  same  Introductory  statistics  course, 
perforinance  of  groups  using  three  types  of  rule  statements  were  compared: 
connected  rules,  discrete  rules,  and  the  regular  expository  test,  which 
served  as  the  control.  The  first  two  treatments  consisted  of  an  integrated 
algorithmic  flow  chart  representation  for  the  connected  condition  and  separate 
nonintegrated  flow  chart  representations  for  the  discrete  condition.  The 
regular  condition  did  not  present  a how-to-use-the-rule  algorithm  of  any  kind. 

On  posttest  performance,  the  connected  and  discrete  rule  groups  scored  signif- 
icantly higher  than  the  regular  instruction  group.  On  affect,  all  three 
groups  were  different  with  the  discrete  group  most  positive,  the  connected 
next,  and  the  regular  Instruction  least  positive.  There  were  no  significant 
differences  between  groups  on  posttest  time  or  response  confidence.  It  was 
concluded  that  providing  the  student  with  an  algorithmic  rule  resulted  in  a 
performance  Increment.  It  should  be  noted  that  this  study  also  used  materials 
which  had  a high  Initial  ISDP  rating  and  that  the  addition  of  an  algorithmic 
representation  of  the  rule  was  still  able  to  cause  a further  Increment  in 
performance . 


3 


ITECSDIJC  Pa3£  BUNK.NOT  PlmcD 

' t. » A-  ^ I* 


Study  t--Val  Iclat  Ion  of  the  Instructional  Strategy  l)^a^’nostl^  Profile 
tn  Physics  100. 

The*  second  methodolof>y  was  used  in  an  Introductory  physics  course 
,Jt  HrlKham  Young  University.  An  ISDP  analysis  of  the  tests  used  in  tlie  course 
indicated  that  less  than  20  i>ercent  of  the  test  items  met  the  ISDP  require- 
ments for  adequate  rule-using.  Performance  on  the  rule-using  items  was 
significantly  lower  than  performance  on  the  memory-oriented  items.  One  of 
the  poorest  (as  indicated  by  test  performance)  units  of  instruction  was 
selected  for  ISDP  modification.  A revised  test  was  prepared  which  included 
more  rule-using  items,  and  the  instruction  was  revised  to  score  higher  on 
the  Profile.  During  the  summer  term,  students  in  the  course  were  randomly 
assigned  to  the  existing  or  the  revised  materials.  All  students  took  both 
the  original  and  the  revised  test. 

There  was  a significant  difference  within  groups,  indicating  that 
performance  on  encountered  test  items  was  better  than  that  on  unencountered 
test  items.  However,  between-group  differences,  on  either  type  of  test, 
while  in  the  predicted  direction,  failed  to  reach  significance.  The  study 
attempted  tn  demonstrate  that  modification  of  the  syllabus  in  accordance  with 
ISDP  principles  would  resuit  in  a performance  Increment.  Because  of  consider- 
able witliln  group  variance  probably  resulting  from  the  uncontrolled  influence 
of  lectures,  the  textbook,  and  student  interaction,  the  results  failed  to 
demonstrate  the  predicted  difference. 


4 


STUDY  1:  FRAMEWORK  RULE  REPRESENTATION  AND 

ELABORATED  FEEDBACK  IN  STATISTICS 
INSTRUCTION* 

Design  Challenges 

The  dt ''elojiment  of  Instruction  that  effectively  prepares  learners  to 
use  complex  rules  under  low  prompt  testing  conditions  presents  two  chal- 
lenges to  the  instructional  designer*  (1)  finding  an  appropriate  represen- 
tation of  the  rule(s)  that  facilitates  recall,  and  (2)  providing  for 
appropriate  practice  and  feedback  that  effectively  prepares  the  learner  to 
apply  tlie  rules  on  similar  test  items. 

Rule  Representation 

Several  sources  provide  evidence  for  the  necessity  of  representing 
rules  in  Instruction  with  accompanying  mathemagenic  information.  Landa 
(1974)  found  that  the  effectiveness  and  efficiency  of  student  performance 
increased  when  algorithmic  representation  was  used  in  teaching  mathematics 
and  language  rules.  Markle  (1975)  found  that  a vertical  list  of  attributes 
was  Superior  to  a paragraph  with  embedded  attributes  in  the  acquisition  of 
rules.  Mayer  (1975a)  concluded  that  an  assimilative  set  or  framework 
facilitated  storage  and  retrieval  of  rules  from  memory.  Minsky  (1974)  sug- 
gests that  a framework  of  Information  facilitates  the  manipulation  of  critical 
elements  when  encountering  a new  situation  or  instance.  Glaser  (1976) 
advocates  finding  ways  to  represent  complex  information  to  the  novice  learner 
in  such  ways  that  encoding  is  facilitated  and  time  to  criterion  competency  is 
decreased . 

Appropriate  Practice 

Practice  can  be  defined  as  an  instructional  display  that  (1)  requires 
the  learner  to  respond  overtly  to  an  explicitly  stated  task  and  (2)  provides, 
at  least,  correct  answer  feedback  to  the  learner.  Practice  is  judged  to  be 
appropriate  if  the  task,  content,  and  feedback  of  the  instructional  display 
are  Isomorphic  to  the  task  and  content  of  the  rule  representation.  It  is 
assumed  in  this  study  that  the  most  effective  practice  displays  should  include 
framework  displays  identical  to  the  rule  representation,  and  that  feedback 
should  be  the  correct  answer  with  elaboration  rather  than  the  correct  answer 
only.  Merrill  and  Wood  (1975),^  Wood,  Richards,  and  Merrill  (1976),  and 
Schmidt,  Wood,  and  Merrill  (1976)  provide  a rationale  and  some  evidence  for 
creating  example  and  practice  displays  that  are  Isomorphic  to  or  consistent 
with  rule  displays. 


*Study  conducted  by  N.  D.  Wood,  R.  M.  Gllstrop,  and  M.  D.  Merrill. 

^A  more  extensive  deflnltlrn  of  the  terms  used  in  this  section  is  found 
in  Merrill  and  Wood  (1975).  Space  limitations  do  not  allow  for  a full 
elaboration  here. 


5 


IlyiuiL  lu'.st-s 


This  stiiily  will  investigate  the  effects  of  (i)  representing  complex 
rules  witliin  a matlicmagenic  framework  of  information  and  (2)  providing 
elalior.jted  feedback  to  practice  displays.  The  general  hypothesis  Is  as 
foil ows : 


The  framework  (mathemagenic ) representation  of  rules 
and  consistent  practice  with  elaborated  feedback  will 
protiuce  significantly  more  positive  student  outcomes  than 
straight  list  or  nonframework  representation  of  rules 
with  correct  answer  only  feedback. 


Methods 


Selection  of  Subject  Matter 

lllaser  and  Resnlck  (1972)  have  identified  the  need  to  do  instruc- 
tional psycholog.y  research  within  realistic  settings  with  existing  curricula, 
liased  on  this  jierceived  need  to  go  beyond  the  artificial  context  and  subject 
matter  of  the  laboratory  setting,  an  ongoing  introductory  statistics  course 
at  brig, ham  Young  University  was  selected  for  the  following  reasons:  (1) 

typical  subject  matter  within  a typical  Instructional  setting  was  available, 
and  (2)  complex  subject  matter  conducive  to  framework  rule  representation 
was  an  integral  part  of  the  course. 

A segment  or  lesson  of  instruction  from  a unit  of  hypothesis  testing 
for  one  mean  was  chosen  as  the  specific  subject  matter  of  the  study.  This 
segment  included  the  following  six  steps: 

1.  Formulate  the  null  (H  ) and  alternative  (H  ) hypotheses. 

O Si 

2.  Choose  sample  size  (n)  and  alpha. 

'3.  Choose  test  statistic. 

4.  Make  decision  rule. 

I.  C.alculate  test  statistic. 

f) . Milkc  decision. 

Sub  ject  s 

A groui>  of  O'!  students  (encouraged  by  the  statistics  department  to 
participate)  was  given  course  credit  for  participation  in  the  experiment. 

Most  of  the  students  were  sophomores  and  juniors,  with  a few  seniors  and 
graduate  students.  A wide  diversity  of  majors  was  represented  by  the  group. 


6 


Treatment  Materials 


The  four  types  of  treatment  materials  were: 

1.  Framework  rule  representation  with  elaborated  feedback. 

2.  Framework  rule  representation  with  correct  answer  feedback. 

3.  Nonframework  rule  representation  with  elaborated  feedback. 

4.  Nonframework  rule  representation  with  correct  answer  feedback. 

All  four  treatment  material  types  were  In  workbook  form  and  were 
randomly  distributed  to  students  In  the  lecture  hall  where  the  experiment 
was  conducted. 

Representative  examples  of  the  rules,  practice,  and  feedback  displays 
used  In  the  four  treatments  are  found  In  Figures  1 through  4.  Figures  1 and 
2 show  the  framework  rule  representation  with  elaborated  and  with  correct 
answer  feedback,  respectively;  and  Figures  3 and  4,  the  nonframework  rule 
representation  with  elaborated  and  with  correct  answer  feedback. 

The  treatment  condition  shown  In  Figure  4 represents  the  original 
Instructional  materials  used  In  the  course.  These  materials  appeared  In 
the  form  of  a self-instructional  text  by  Christensen  (1974) . An  agreement 
was  made  with  the  Instructor  that  any  treatment  that  was  considered  by  means 
of  the  Instructional  Strategy  Diagnostic  Profile  (ISDP)  (Merrill  & Wood,  1975) 
as  less  effective  than  the  original  would  not  be  used.  Therefore,  the  treat- 
ments shown  In  Figures  2 and  3 were  considered  to  be  more  effective,  while 
the  treatment  In  Figure  1 was  considered  to  be  the  most  effective. 

Instrumentation 


The  post-treatment  test  administered  to  all  students  In  the  experiment 
comprised  12  paper-and-pencll,  multiple-choice  questions  of  recall,  concept 
classification,  and  rule-using  types.  An  example  of  a rule-using  test  Item 
Is  found  In  Figure  5.  The  number  of  correct  answers  on  the  twelve  Items  by 
each  student  was  used  as  the  dependent  variable  performance . 

Each  question  was  followed  by  a seven-point  differential  scale,  which 
probed  the  student's  confidence  In  his  answer  to  the  test  question.  An  example 
of  the  confidence  scale  Item  Is  shown  In  Figure  5.  The  average  confidence  for 
each  student  on  all  Items  was  used  as  the  dependent  variable  confidence. 

The  affect  that  the  Instruction  had  on  the  student  was  measured  five 
times  during  the  treatment  period,  using  a seven-question  semantic  differential 
scale  format  (see  Figure  5).  The  measures  were  taken  after  students  (1)  read 
the  Instructional  materials,  (2)  responded  to  test  Items  1 and  2 (memory  type), 
(3)  responded  to  test  Items  3,  4,  5,  and  6 (concept  and  rule  using),  (4) 
responded  to  test  Items  7,  8,  9,  and  10  (concept  and  rule  using),  and  (5) 
responded  to  test  Items  11  and  12  (higher  order  rule  using) . The  word  pairs 
were  scrambled  as  to  order  and  polarity  for  the  five  measures  in  order  to 
avoid  an  anticipation  effect.  The  average  score  for  each  student  on  all  five 
affect  n«aaures  was  used  as  the  dependent  variable  affect. 


7 


'iguro  1.  Framework  rule  representation  with  elaborated  feedback. 


8 


Figure  1.  (Continued). 


9 


Answers  to 
Practice  Problem  1 


1.  I ORMUl.AI  E Hq  AND  Hg 

Hq:  u = 80  lla:  p / 80 


2.  CHOOSE  n,<x 

3.  CHOOSE  TEST  STATISTIC 

n - 10 

r 

(I  ^ .05 

S//10 

4.  MAKE  DECISION  RULE 


5.  CALCULATE  TEST  STATISTIC 

^ ^ 81. ’.T  80.00  ^ 2.  ,^r,4 
1 .K//I0 

6.  MAKE  DECISION 

Simc  t > t Q2S  (~2..^.S4  > 2.262) 

we  rcji’ct  tlic  H . 

■'  o 


Rrmtrinbcr  a slatement  of 

eciiiality  (cither  <,  or  =Jwill 
always  appear  in  the  H_.  An  = 
sign  in  llie  IIq  always  ocfincs  a 
two  tail eti  test.  The  Hg  is  always 
the  eoinpl einent  of  the  Hq. 

In  Step  2,  a is  givt..  and  n is 
obtained  by  eoiinting  the  luunber  of 
observat ions. 

In  Step  .3,  Pp  ’ 80  and  n = 10  are 
substituted  into  the  test  statistic 


(t  = 


(2.262)  was  obtained  as 
follows;  note:  |t|  stands  for 

the  absolute  value  of  t. 

a.  For  a two-tailed  test  divide 
a by  2 (a/2  = .025) 
df  (degrees  of  freedom)  = 
n-1  = 9 

Obtaint  value  from  table 
using  df  = 9 and  a = .025 


The  t 


b. 


Looking  at  tlie  fonmila  from  Step  ?>, 
we  see  tliat  the  values  for  x and 
s arc  missing.  Using  the  values 
given  in  the  i)rohlem,  (x  = 81. .T4 
and  s = 1.8)  we  calculate  the 
t value. 

looking  at  the  diagram  in  Step  4, 
we  see  that  since  2.. 3 54  > 2.262, 
we  are  in  the  rejection  region  of 
the  right-hand  tail.  We  can  con- 
clude, therforc,  that  the  inol.isses 
cannot  be  gradc'd  as  liigh  quality. 


The  six  steps  of  this  procedure  can  be  remembered  more  easily  If  they 
are  listed  In  a framework  similar  to  the  one  below. 


FORMULATE  H AND  H 
o a 


2.  CHOOSE  n,  = 


CHOOSE  TEST  STATISTIC 


. MAKE  DECISION  RULE 


5.  CALCULATE  TEST  STATISTIC 


MAKE  DECISION 


These  six  steps  will  be  discussed  In  more  detail  within  the  above  frame- 
work. The  examples  for  Illustration  and  the  practice  examples  will  be  pre- 
sented using  this  framework  In  order  to  assist  in  learning  and  remembering 
the  6 basic  components  of  hypothesis  testing. 


Figure  2.  Framework  rule  representation  with  correct  answer  feedback. 


11 


PRACT I Cl- 
in the  follnwinj'  prohlfins,  use  the  6-slcp  proeeilure  (iiscussed  in  this 
set  t ion  (o  lest  the  hypothesis  for  imsins  of  normally  distrihrrted  popir lat ions 
when  Ojj  is  not  Known: 

^ Hie  following  are  measurements  of  Brix  degrees  on  mtrlasses:  82.0,  79.6, 

78.4,  81.8,  82.2,  79.9,  83.2,  79.9,  82.3,  84.1.  In  oriler  to  be  graded  as  high 
quality  molasses,  the  Brix  degrees  must  be  equal  to  80.  At  an  a value  of  O.OS, 
could  tlie  molasses  from  which  the  samplc^were  taken  be  graded  as  high  quality? 
For  this  data,  x ^ 81.34,  s = 1.8,  and  /lO  = 3.16. 


I’l^^ure  2.  (Continued). 
12 


Answers  to  Practice  Problems 


13 


Dl  l-INI  riON 


A Test  for  One  Me, -in  WJion  a Is  Not  Known.  A tf?.st  of  liyj)ot1icsis  for 

— - X - 

OIK*  inenn  when  a is  not  known  is  a statistical  procedure  used  to  decide  whe- 

X 

flier  or  not  the  in'ean  of  a normally  distributed  population  takes  on  the  value 
of  K . This  jirocedure  differs  fiom  the  one  set  forth  in  the  previous  section 
in  tie*  test  ■•tatistic  used  and  in  the  decision  rules  employed.  In  this  section  s 
is  u'.ed  as  an  estimator  of  o^.  The  6 steps  of  the  iirocedurc  are  as  follows: 

1.  rormulate  M and  11  . The  3 possible  hypotheses  for  the  mean 
(;f  a normally  diiitri^uted  population  when  is  not  knoivn  are: 


a . 

H : 
o 

P 

< 

p 

o 

VS  . 

H : 
.,3 

P 

> p . 

o 

h. 

H : 
. .o 

w 

> 

p 

o 

vs  . 

H : 
..3 

P 

< P 
J o 

c. 

H : 
o 

u 

p 

o 

vs . 

H : 
a 

P 

f*  p . 
o 

2.  riKHjse  a sample  size,  n,  and  a value  for  a. 

3.  bet  the  test  statistic  be 

X - VJ 

o 

* .s//n" 

4.  fbi  the  b;isis  of  the  a value,  clioose  the  decision  rule  according 
to  the  dei'ision  rule  table,  table  17. 

5.  Take  the  s.iinple,  and  compute  the  test  statistic. 

R.  Apply  the  decisicm  rule,  and  make  the  decision. 


I-’imire  Nonf ramework  rule  representation  with  elaborated  feedback. 


14 


PRACTICE 


In  the  following  problems,  use  the  six-step  procedure  discussed  in  this 
section  to  test  the  hypothesis  for  means  of  normally  distributed  populations 
when  Is  not  known: 

1.  The  following  are  measurements  of  Brix  degrees  on  molasses:  82.0,  79.6, 
78.4,  81.8,  82.2,  79.9,  83.2  79.9,  82.3,  84.1.  In  order  to  be  graded  as  high 
quality  molasses,  the  Brlx  degrees  must  be  equal  to  80.  At  an  a value  of 
0.05,  could  the  molasses  from  which  the  samples  were  taken  be  graded  as  high 
quality?  For  this  data,  ± - 81.34,  s = 1.8,  and  /lO  •=  3.16. 


(1) 

(2) 


(3) 

(4) 

(5) 

(6) 


Figure  3.  (Continued). 


r 


15 


(1)  p = 80  H;j:  m / 80 


(2)  a = .05  n = 10 


^ ■ sA/nr 


(4)  Reject  Ho  if  lt|  > 
2.202,  otlieiTvise 
;iiee|it  Hg  (note:  |t 
st.iiuis  for  the  ab- 
solute v.ilue  of  t) 


Answers  to 
Practice  Problem  1 

Remeniber  tluat  a st.it  ement  of  eiiii.ility 
(cither  <,  or,  -)  will  .always  ajipear 
in  the  Hg.  An  = sij;n  in  the  Hg  always 
defines  a two-tailc-d  test.  The  is 
always  the  caiipl ement  of  the  Hq. 

a is  given  and  n is  obtained  by  counting 
the  number  of  observations. 

Uq  = 80  and  n = 10  are  substituted  into 
the  test  statistic  X - Pq^ 

" s/Af  J- 

The  I t|  (2.262)  was  obtained  as  follows: 

a.  F-or  a two-tailed  test  divide 
(X  by  2 (a/2  = .025) 

b.  df  (degrees  of  freedom)  = n 1 = 9 

c.  Obta  i n t-value  from  table  using 

df  = 9 and  a = .025 


(5)  X = 81..i4  t = 2.. ^54 


(6)  Since  2.354  > 2.262 
reject  llg 


Rejection  Acceptance  Rejection 
Region  Region  Region 

(t<-2.262)  (t>2.262) 

■^.025’"^-^^^  ^025"'^-^^’^ 

To  obtain  the  calculated  t -value,  it  is 

necc'ssary  to  look  .it  the  fonmila  from 
Step  3 Using  the  values  of 

X and  s tlvit  were  given  in  the  problem, 
we  substitute  and  compute  as  follows: 

81.34  - 80.  OJ) 

1.8  / /nr 

1_^34 

.569' 

t = 2.354 

hooking  at  the  diagram  in  Step  4, 
we  see  that  since  2.354  > 2.262,  we  are 
in  the  rejection  region  of  the  right  hind 
t.'iil.  We  c.in  conclude,  therororc,  th.it  the 
molasses  cannot  be  gr.idcd  as  high  ciuality. 


16 


DEFINITION 


A Test  for  One  Meaiv  When  o^.  Is  Not  Known.  A test  of  hypothesis  for 
one  mean  when  o^  Is  not  known  is  a statistical  procedure  used  to  decide  whe- 
ther or  not  the  mean  of  a normally  distributed  population  takes  on  the  value 
of  u^.  This  procedure  differs  from  the  one  set  forth  in  the  previous  section 
In  the  test  statistic  used  and  In  the  decision  rules  employed.  In  this  section 

6 is  used  as  an  estimator  of  o . The  six  stens  of  the  procedure  are  as  follows: 

X 

1.  Formulate  H and  H . The  three  possible  hypotheses  for  the  mean 
of  a normally  distributed  population  when  a Is  not  known  are: 


a* 

H : 
o 

y 

< 

vs. 

H : 
a 

u 

> 

%■ 

b. 

H : 
0 

u 

> 

^0 

vs. 

H : 
a 

u 

< 

c . 

H : 
0 

u 

- 

% 

vs. 

H : 
a 

u 

2.  Choose  a sample  size,  n,  and  a value  for  a. 

3.  Let  the  test  statistic  be 

s//n 

4.  On  the  basis  of  the  a value,  choose  the  decision  rule  according 
to  the  decision  rule  table,  table  17. 

5.  Take  the  sample,  and  compute  the  test  statistic. 

6.  Apply  the  decision  rule,  and  make  the  decision. 

Figure  4.  Nonframework  rule  representation  with  correct  answer  feedback. 


,fr 


17 


I’RACnCK 


In  file  followiti'i  ju dhl'-;iis,  use  the  6 step  proictlui c disMisse<l  in  this 
settion  to  test  the  hypothesis  for  means  of  nonaally  d i st  t ihuf  c<I  jinpulat  ions 
v.lien  (Jjj  is  not  known: 

1.  ilie  followini;  are  measurrments  of  Rrix  (h‘'>rees  on  molasses:  87.0,  79  6, 

/R.4,  HI. 8,  87. 2',  79.9,  8.5.2,  79.9,  82.5,  84.1.  In  order  to  he  graded  as  hijih 
uii.ility  molasses,  the  Brix  decrees  must  he  rsiual  to  80.  At  .an  n value  of  0.05, 
tould  the  molasses  from  which  the  samples  were  taken  he  gi.ided  as  hijjh  ([U.alily? 
Tor  this  d.ita,  x = 81.54,  s = 1.8,  and  /lO  - 5.16. 


(1) 


(?) 


(3) 


(4) 


((») 


Figure  4.  (Continued). 


[ 

i 


18 


Answers  Co  Practice  Problems 


Lesson  2 Section  2 


1. 


(1)  H : u • 80  H : u 80 

o a 


(2)  a 

(3)  t 


.05,  n ■ 10 

X - 80 

s / /lO 


(4)  Reject  H If  ] t | >^  2.262,  otherwise  accept  H (note  |tl  stands  for 

the  absolute  value  of  t) . ° 

(5)  X - 81.34  t - 2.354 

(6)  Since  2.354  > 2.262  reject  H . 

0 


Figure  4.  (Continued). 


19 


The  effect 

of 

workers  using  new 

tools  on  the  number 

of  circuit 

boards  assembled 

in  an  electronics 

plant  is  being 

tested  . A 

random  sample  a 

f 

individual  worki^rs 

' production 

total s 

is  taken: 

1.  1,  6, 

5, 

7,  8.  2,  2,  4,  5, 

7. 

3.  6.  4. 

The  .average 

number  ol  circuit 

boards  for  this  sample  is  4.5 

with  a 

st.indard 

deviation  of  2. 

The  plant  manager 

wants  to  know  if  the 

sample 

average  of  4.5 

Is 

statistically  different  from 

the 

previous  average 

of  5.5.  Let  a 

= 

.05 

(Use  this  area  for  (Mark  an  X 

in  the  box 

that 

corresponds 

work  space. 

) 

to  the 
below. ) 

best 

answer 

for 

each 

test  item 

7 . The 

appropr  late 

formulation  of 

H 

o 

and 

H for  the 
a 

above 

problem  is: 

a . 

11  : 
0 

M < .5..S 

VS 

. II  ; 

.T 

p > 5.5 

b. 

11  : 
0 

li  = .S.5 

vs 

. II  : 
a 

p i 5.5 

c . 

II  : 

o 

p < 4.. 5 

vs 

• 

p > 4.5 

d. 

II  : 
0 

M ? 4.5 

vs 

• = 

p < 4.5 

c. 

H ; 
o 

p = 4.5 

vs 

. H : 
a 

P f 4.5 

f. 

H : 
0 

p > 5. 5 

vs 

. II  : 
a 

p < 5.5 

How  confident  are  you  in  your  answer  to  the  above  question? 


Very 

conf idcnt 


Not  at  all 
conf Ident 


Figure  Dependent  variable  measures  including  an  example  of  a 
rule-using  test  item,  a conflilence  scale,  a semantic 
differential  affect  scale,  and  a time  record  space. 


20 


interesting  : : : : : 

: boring 

worthless  : : : : : 

: beneficial 

complete  : : : : : 

: incomplete 

detestable  : : : : : 

: enioyable 

clear  : : : ; : 

: confusing 

relevant  : : : : : 

; irrelevant 

redundant  : : : : : 

: concise 

Please  record  the  time  on  the  clock; 

Figure  5.  (Continued) 


2 


Mo.isiiri’s  ot  rlio  aimnint  of  time  spent  liy  Individual  students  on 
seetions  of  t lie  treatment  materials  were  taken  at  the  same  five  points 
as  the  alfect  measures.  hach  student  wrote  down  the  time  from  the  wall 
cloik  in  the  space  iirovided  (see  Figure  5).  The  total  elapsed  time  taken 
(luring  the  treatment  period  by  each  student  was  used  as  the  dependent 
variable  t iine. 

Des ign 

A 2 X 2 factorial  design  with  a multivariate  analysis  of  variance 
(ANOV'A)  was  used  to  assess  the  effects  of  treatments  across  all  four 
deptuuient  variables.  A generalized  ANOVA  program  which  adjusted  for  unequal 
cell  sizes  (Bryce  & Carter,  1974)  was  used  to  analyze  the  data. 

Res  111  ts 


Ihe  lol lowing  four  hypotheses  were  tested: 

1 . Hypothesis  1 

I'erformance  scores  will  be  higher  for  the  framework  rule/elaborated 
feedback  treatment  group  than  for  the  nonframework/correct  answer  treatment 
group. 

2 . Hypothesis  2 

Confidence  scores  will  be  higher  for  the  framework  rule/elaborated 
I eedback  treatment  group  than  for  the  nonfeedback/correct  answer  treatment 
group. 

'3 . Hypothesis  3 

Affect  scores  will  be  higher  for  the  framework  rule /elaborated 
f eedbac k treatment  group  than  for  the  nonframework/correct  answer  treatment 
group. 

4 . Hypothesis  4 

Total  elapsed  time  will  be  less  for  the  framework  rule/elaborated 
f eedbac k treatment  group  than  for  the  nonframework/correct  answer  feedback 
treatment  group.  ~ ” 

A multivariate  analysis  of  variance  (MANOVA)  was  perfo»"med  simultaneously 
on  all  four  dependent  variables  (performance,  confidence,  affect,  time)  as  a 
control  for  .in  Increase  in  Type  I error  through  repeated  univariate  tests. 

The  lUNOVA  F-test  for  the  full  2x2  factorial  model  was  not  significant: 

_F  (4,8b)  = 0.697,  £ > .05.  The  means  and  standard  deviations  are  reported  in 
Table  I,  and  the  respective  F ratios,  on  Table  2. 


r.ible  2 

Summary  <>l  Univariate  F-Ratios  on  Four 
Dependent  Variables 


Dependent  Variable 

Sourc f of 

Variation  1 

’er  f ormance 

Conf idence 

Affect 

Time 

Rule  Representation 

.2b 

1.06 

.25 

1 .73 

Feedback 

1.44 

.23 

1 .64 

.003 

R X F 

,69 

.03 

2.34 

.122 

Note.  Ail  F-ratlos  were 

based  on  df  ■ 

= 1,89  and  a = 

.05. 

Disc:ussion 


i'lie  mean  scores  on  iu>r  forimance , confidence,  and  affect,  as  well  as  the 
total  elapsed  time  for  treatment,  were  chosen  as  the  level  of  measurement, 
since  more  precise  analysis  (breaking  each  variable  out  into  smaller  cate- 
Kories)  yielded  no  additional  infornuition. 

The  consistent  lack  of  significant  differences  across  the  design  in  the 
study  may  be  due  to  one  or  more  of  the  following  reasons: 

1.  The  original  version  of  the  instruction  (Figure  4)  was  considered, 
by  means  of  an  ISDP  analysis,  to  be  superior  to  any  other  available  printed- 
format  material  on  the  subject,  it  is  assumed  that  a less  effective  treat- 
ment (e.g.,  embedded  rules  in  text,  partial  or  no  procedural  helps  for  using 
the  rule)  would  have  assisted  in  creating  differences  between  groups;  in 
other  words,  by  definition,  the  treatments  were  very  similar. 

2.  The  2 hours  allowed  to  the  experimenters  for  the  treatment  period 

was  judged  to  be  insufficient  for  the  complexity  of  the  subject  matter  involved. 
Th('  amount  of  Information  to  be  processed  was  probably  too  much  for  students, 
rep.ardless  of  the  treatment  condition.  It  is  assumed  that  the  net  effect  of 
this  time  constraint  drastically  reduced  the  between-group  variance  that  other- 
wise mlpjit  have  existed. 

Additional  research  efforts  might  (1)  create  greater  differences  in  treat- 
ments by  embedding  (or  making  less  mathenugenlc ) critical  attributes,  and  (2) 
allow  for  more  time  on  task  to  determine  if  between-group  variance  can  be 
Inc  reased . 

If  complex  rules  can  be  represented  to  the  learner  in  ways  that  will 
develoi)  skills  of  competent  recall  and  use  (application)  in  realistic  test 
sitiiatlons,  a valuable  tool  for  the  instructional  developer  to  increase 
the  effectiveness  and  reduce  the  cost  of  instruction  could  be  made  available. 


24 


STUDY  2: 


TEST  AND  GENERALITY  CONSISTENCY  IN  A 
STATISTICS  CLASSII'ICATION  TASK^ 


Problem 


Analyses  of  tests  often  Indicate  a required  test  performance  that  is 
not  entirely  consistent  with  the  associated  Instruction.  The  assumption 
of  this  study  is  that  instruction  should  present  the  student  with  both  the 
content  of  and  the  behavior  required  for  performance  on  a subsequent  test. 

One  of  the  components  of  test-instruction  consistency  is  a congruence 
between  the  test  and  the  generality  (statement  of  rule,  definition,  or  pro- 
position upon  which  the  instruction  is  centered).  Though  some  evidence 
exists  to  indicate  that  a generality  impacts  positively  upon  performance 
(Merrill,  Olsen,  & Coldeway,  1976),  the  effect  of  test-generality  isomophism 
has  apparently  not  been  specifically  tested. 

A long  time  ago.  Yum  (1931)  found  that  a slight  change  in  stimulus 
properties  from  instruction  to  test  resulted  in  a significant  decrement  of 
successful  responses  on  test  performance.  Researchers  have  been  slow  in 
extending  this  sort  of  tightly  controlled  paired-associate  study  into  the 
more  complex  levels  of  instructional  application  (Glaser  & Resnlck,  1972). 

At  least  part  of  the  reason  for  this  slow  pace  was  summarized  by  Stake  (1973), 
who  stated  that  neither  scales  nor  grounds  have  been  developed  for  describing 
test  and  instruction  similarity,  though  he  cites  some  progress  being  made 
(e.g.,  Anderson,  Goldburg,  & Illdde,  1971).  Anderson  (1972)  recognizes  the 
problem  in  a different  way  when  he  suggests  that  achievement  tests  are  based 
on  "things"  not  clearly  and  consistently  defined.  Gropper  (1970)  has  made 
some  inroads,  indicating  an  Influence  of  spatial  organization  of  materials 
upon  student  response.  Mayer  (1975b)  has  noted  a forward  processing  effect 
that  shows  a relationahip  between  the  kind  of  stimulus  materials  used  in 
instruction  and  the  test  response. 

Scandura's  use  of  the  algorithm  and  higher-  and  lower-order  rules  in 
instruction  (Ehrenpreis  & Scandura,  1974;  Scandura,  1970,  1973,  1974)  stresses 
the  Importance  of  specifying  the  precise  behaviors  requested  of  the  learner. 
Shoemaker  (1975)  echoes  this  when  he  speaks  for  having  identical  elements  in 
both  instruction  and  test  items.  Gropper  (1976)  takes  the  position  that  task 
and  content  post-instructional  test  analysis  should  Include  the  same  taxonomic 
categories  as  the  "front-end"  Instruction  to  effectively  diagnose  learning 
failures . 

Landa  (1974)  concludes  that  students  have  difficulties  in  solving  unen- 
countered examples  because  the  general  rules  necessary  for  Identifying  specific 
solution  rules  are  unidentified  and  not  taught.  When  this  inconsistency  is 
resolved,  the  integration  of  separate  rules  is  facilitated,  and  errors  decrease 
rapidly  over  a relatively  short  period  of  subsequent  instruction. 


^Study  conducted  by  R.  V.  Schmidt,  N.  D.  Wood,  and  M.  D.  Merrill. 


25 


All  till'  f orompiu  ioned  studies  point  to  a felt  need — and  some  evidence — 
tliat  wlial  is  tested  should  have  been  presented  previously  to  the  student 
(thouph  the  spi‘clflc  Instances  should  differ).  dust  how  close  this  match 
should  he  is  open  to  question.  A study  by  Scandura  and  Durnln  (1968), 

Indicates  that  a minor  shift  away  from  test-generality  isomorphism  is  per- 
liaps  desirable. 

Merrill  and  Wood  (197A;  1975),  in  their  Instructional  Strategy  Diagnostic 
Profile  (ISDP),  liave  taken  up  Stake's  challenge.  They  have  provided  scales 
and  are  continuing  to  establish  grounds  for  describing  and  evaluating  concept- 
level  instructional  materials.  Wood,  Kichards,  and  Merrill  (1976)  have 
developed  and  validated  a measure  of  test-instruction  similarity  with  selected 
constructs  from  the  ISDP. 

This  study  deals  specifically  with  an  assumption  made  in  the  ISDP  that 
the  test  and  generality  should  be  consistent.  It  compares  performance  of 
students  p.iven  generalities  that  differ  in  three  ways  in  the  degree  to  which 
they  are  consistent  with  the  content  and  behaviors  requested  by  the  test 
items.  First,  a generality  can  present  the  student  with  content  without 
presenting  the  precise  task  behaviors  he  will  be  asked  to  perform.  (This 
does  not  mean  that  no  required  behavior  is  taught  or  implied,  but  that  the 
specific  behavior  required  is  not  taught.)  This  is  a low  level  of  consistency. 
Second,  a generality  can  present  the  task  conditions  under  v;hich  the  student 
will  be  asked  to  perform,  introducing  separate  generalities  for  each  task 
nuiklng  up  a larger  task.  We  call  this  "discrete  rule”  consistency,  which 
requires  that  the  mode  of  behavior  be  consistent  (recall  tested  with  recall, 
rule-using  tested  with  rule-using  tasks).  Thus,  a student  is  working  with 
consistent  discrete  rules  when  an  item  of  information  he  is  asked  to  learn  is 
taught  and  tested  in  recall  mode  or  when  a concept  he  is  asked  to  learn  is 
taught  and  tested  with  rule-using  behaviors.  If  the  item  is  taught  with  a 
rule-using  behavior  and  tested  in  a recall  mode,  the  test  and  instruction  are 
Inconsistent.  The  third  way  in  which  we  looked  at  generality-test  consistency 
Involves  task  sequencing.  This  is  "connected  generality"  consistency.  Gagn^ 
(1970)  and  Mechner  (1967)  discuss  this  level  when  they  describe  "behavior 
chains."  A test  that  asks  the  student  to  perform  sequential  discrete  tasks 
in  a way  which  is  not  presented  in  rule  or  practice  form  lacks  what  may  be 
an  important  consistency  characteristic. 

Hypothesis 

Under  the  assumption  that  a generality  is  best  that  is  consistent  with 
the  required  terminal  performance,  we  proposed  the  following  hypothesis: 

I’erformance  on  test  items,  affect  toward  Instruction,  con- 
fidence in  test  item  answers,  and  total  elapsed  time  will  be 
higher  for  students  experiencing  connnected  generality  consistency 
treatment  than  for  students  experiencing  discrete  rule  consistency; 
and  these  measures  for  both  the  connected  and  discrete  rule  con- 
sistency treatments  will  be  higher  than  for  the  content-only  con- 
sistency group. 

Four  separate  hypotheses  corresponding  to  four  dependent  variables  result 
from  the  above  general  statement.  F.ach  will  be  treated  separately  in  the 
reporting  of  results. 


26 


Method 


Sub.lect  Matter  Content 

An  introductory  statistics  course  at  Brigham  Young  University  (BYU) 
was  selected  as  the  experimental  situation  for  this  study.  The  specific 
matter  of  hypothesis  testing  was  chosen  because  generally  low  scores  from 
past  achievement  tests  Indicated  a high  level  of  difficulty  with  the  topic. 

The  subject  matter  met  the  experimental  specifications  of  having  multiple 
rules  that  could  be  taught  as  separate,  discrete  rules  or  as  connected 
rules.  The  following  hypothesis  tests  were  covered  in  the  selected  unit  of 
instruction  (Christensen,  197^,): 

1.  Test  for  one  mean  when  o is  known. 

2.  Test  for  one  mean  when  a is  not  known. 

3.  Test  for  two  means  when  the  samples  are  independent. 

4.  Test  for  two  means  when  observations  are  paired. 

5.  Test  for  one  proportion. 

6.  Test  for  two  proportions. 

7.  Chi-square  test. 

8.  Multinomial  test  of  hypothesis. 

Subjects 

The  subjects  were  95  regular  enrollees  in  a college  statistics 
undergraduate  course.  The  course  serves  as  one  of  the  choices  for  fulfilling 
a general  education  requirement  at  BYU.  Students  received  credit  in  the  form 
of  additional  points  toward  the  final  course  grade  for  participating  in  the 
2-hour  session  and  were  informed  that  failure  to  participate  would,  in  effect, 
penalize  them,  although  the  additional  points  were  not  dependent  upon  their 
performance. 

Treatments 


The  study  consisted  of  three  treatments.  Students  in  the  first 
treatment  group  received  a connected  generality  in  the  form  of  an  algorithm 
which  presented  both  the  content  operation  and  the  task  necessary  to  take  a 
student  from  the  reading  of  the  verbally  stated  problem  to  the  correct  test 
of  hypothesis  and  test  statistics  (see  Figure  6).  Subsequent  practice  pro- 
vided two  examples  of  each  type  of  hypothesis  test  and  test  statistic. 
Correct  answer  feedback  was  provided  on  the  reverse  side  of  the  practice 
pages.  (See  Figure  7 for  a sample  practice  question.) 


27 


d 


28 


Figure  6.  Connected-rule  algorithm  for  selection  of  hypothesis  test 


13.  A new  social  studies  program  Is  supposed  to  produce  significantly 
better  results  than  a program  It  Is  to  replace.  Students  In  the 
course  are  matched  on  the  basis  of  sex,  IQ  and  G.P.A.  and  then  the 
pairs  are  divided  into  "new  method"  and  "old  method"  groups.  Their 
scores  on  a final  achievement  test  are  taken  as  evidence  of 
performance . 


Qa.  Select  the  number  for  the  appropriate  test  type  from  list  A. 
Qb.  Select  the  number  for  the  correct  test  statistic  from  list  B. 


A-Type  of  Test  B-' 

1.  test  for  one  mean,  1. 

t distribution 


2.  test  for  two  means,  2. 

dependent 


3.  test  for  one  proportion  3. 


A.  test  for  two  proportions  4. 


5.  none  of  the  above  5. 


rest  Statistic 

X - u - p 

O X - o 


s-  s //n 

X X 


- p - V 

X - O _ X - o 

o-  o //n 

X X 


d - D d - D 


®d  ^d^*^ 


none  of  the  above 


Figure  7.  A sample  practice  question. 


29 


Students  in  the  second  treatment  condition  were  presented  the  dis- 
irete  (unconnected)  multiple  generalities  used  in  determining  the  correct 
test  ol  hypothesis  and  test  statistic.  Th(‘se  consisted  ol  a walk-through 
of  separate,  very  simple  algorithms  which  took  the  student  to  the  appropriate 
test  after  the  student's  initial  decision  as  to  the  specific  type  of  data  he 
was  working,  with  (see  I’lgure  8).  Students  in  this  group  were  not  given  any 
directed  strategy  for  connecting  these  behaviors  or  for  using  them  as  part 
of  an  ('ver  ill  process  to  heli>  them  make  their  initial  decisions  as  to  the 
nature  of  tl>e  statistical  problem.  Subsequent  practice  provided  the  student 
witli  two  samples  of  each  type  of  tlecision.  Correct  answer  feedback  was 
provided  on  the  reverse  side  of  the  practice  pages. 

The  instructional  materials  for  the  third  treatment  condition  con- 
sisted of  the  regular  text  used  in  the  course  and  directions  for  providing 
appropriate  practice.  The  practice  directions  consisted  of  a sample  item  of 
the  type  ust^d  in  the  posttest  with  Instructions  to  practice  tlie  selected 
items  found  at  tlie  end  of  textbook  sections.  Mo  generality  was  provitled  for 
analyzing  the  type  of  test  that  a verbal  jiractice  item  may  pose.  The  text 
also  did  not  help  here,  for  each  test  of  hypothesis  was  presented  In  a 
separ.ite  lesson,  gave  practice  only  in  a stated  kind  of  hypothesis,  and  did 
not  request  students  to  differentiate  on  the  basis  of  kinds  of  test.  As  all 
students  had  previously  been  exposed  to  this  material,  this  group  became 
essentially  a control  group. 

In  the  previous  study  (Study  1 of  this  report),  it  was  hypothesized 
that  time  would  decrease  as  a result  of  our  treatments.  In  this  situation, 
altliouj’.h  It  is  desirable  to  reduce  the  amount  of  time  students  take  on  in- 
slru'  tlon,  it  is  expected  that  time  will  increase,  based  on  Bloom's  (1974) 
ohs('rv.it  Ion  that  quality  Instruction  Initially  takes  longer,  especially  if 
the  effect  toward  the  Instruction  has  on  a student  and  confidence  in  mastery 
of  th(!  subject  matter  increases.** 

Inst  rumen  tat  ion 


A brief  three-question  pretest  was  administered  to  get  some  measure 
of  student  entry  beiiavlor.  The  pretest  was  identical  in  form  to  the  quest  ions 
used  In  practice  (where  given)  and  to  tlie  posttest.  Students  were  also  asked 
to  Inilli-ate  lectures  attended,  materials  read,  and  workbook  practice  completed 
in  regard  to  the  unit  on  hypothesis  testing. 

Measures  for  the  four  dependent  variables  of  Interest  in  the  study  were 
^ provided  for  in  the  treatment  materials  and  posttest  and  are  discussed  below. 

Performance.  The  basic  performance  task  required  the  student  to 
select  (I)  the  appropriate  hypothesis  test  for  a problem  statement,  and  (2) 
the  appropriate  statistical  test  associated  with  the  hypothesis  test  (see 
^'lgnre  9).  i'he  22  multiple-choice  questions  provided  for  44  responses.  How- 
ever, only  the  hypothesis  test  responses  were  used  in  the  data  analysis. 


‘'Bloom  does  not  clearly  define  quality  instruction  except  for  characterizing 
it  as  mastery  learning. 


JO 


5. 


Senator  Incorig  feels  that  the  bills  introduced  by  iMcinbers  of  his  party 
in  Congress  v>ill  be  given  a positive  or  negative  vote  o:i  the  basis  of  party 
affiliation.  To  test  this  he  assesses  the  bills  over ’a  month's  time,  keep- 
ing track  of  the  party  affiliation  of  those  who  took  part  in  the  voting 
(Democratic  or  Republican)  and  what  the  vote  was  ("For"  or  "Against"  or 
"Abstain"). 


□ a.  Select  the  number  for  the  appropriate  test  type  from  list  A. 

□ b.  Select  the  number  for  the  correct  test  statistic  from  list  B. 

How  confident  are  you  in  your  answers  to  the  above  question? 


Very 

Confident 


Not  at  all 
Confident 


A-Type  of  Test 


1. 

test 

for 

homogeneity 

1. 

rr  - 

2. 

test 

for 

tivo  means,  dependent 

(5, 

c _ 

3. 

test 

for 

two  means,  independent 

2. 

4. 

test 

for 

two  proportions 

*1 

5. 

none 

of 

the  above 

3. 

d - 

"d 

B-Test  Statistic 


V 

4.  ^*^1  • ~ *o 

"p,-P2 

5,  none  of  the  above 


Do  not  return  to  this  page  once  you  have  completed  your  answers. 


Figure  9.  A sample  of  the  basic  performance 
task  and  confidence  rating. 


The  test  was  designed  to  have  more  items  than  most  students  could 
complete  in  the  allotted  time  so  that  differences  in  time  and  number  of 
items  completed  for  the  separate  treatment  groups  could  be  ascertained. 

Affect . Questions  at  the  end  of  the  treatment  materials  and  at 
the  end  of  the  posttest  allowed  students  to  respond  to  a five-category 
continuum  of  general  affect  in  terms  of  how  well  the  instruction  provided 
preparation  for  performance  on  a test. 

Confidence . A seven-point,  semantic  differential  scale  was  included 
after  each  of  the  22  test  items  in  order  to  Assess  the  amount  of  self-per- 
ceived confidence  students  had  in  their  answers  to  the  multiple-choice 
questions  (see  Figure  9). 

Time.  All  students  were  to  mark  the  time  from  the  wall  clock  in  a 
space  provided  at  (1)  the  point  where  they  finished  the  first  11  items  on 
the  posttest  and  (2)  at  the  end  of  the  posttest  session. 

Procedure 


Students  were  randomly  assigned  to  one  of  the  three  treatment 
conditions.  After  the  pretest,  students  were  told  there  would  be  three 
timed  sessions,  and  they  were  requested  not  to  begin  any  one  of  them  until 
asked  to  do  so.  They  were  also  informed  that  the  materials  provided  would 
be  collected  before  the  test.  Finally,  they  were  Informed  that  they  would 
be  provided  with  more  Items  in  each  section  than  they  would  most  likely  have 
time  to  finish  and  that  they  should  work  steadily  but  carefully. 

The  students  were  given  1/2  hour  for  the  study  session.  They  were 
then  requested  to  move  on  to  the  practice  but  were  allowed  to  return  to  the 
study  materials  if  they  wished.  The  practice  session  lasted  40  minutes,  after 
which  the  students  recorded  their  sense  of  preparedness  and  attitude  toward 
the  materials  used.  All  materials  were  collected.  The  tests  were  then  passed 
out,  and  the  students  were  given  30  minutes  to  work  the  22  problems.  Time  was 
recorded  on  each  test  after  the  eleventh  question  and  again  at  the  end  of  the 
test.  Students  responded  to  a second  affective  measure,  and  the  materials 
were  collected. 

Design 

The  three  treatment  groups  provided  three  levels  of  the  main  effect, 
level  of  generality  consistency.  A one-way  analysis  of  variance  design  pro- 
vided the  statistical  model  for  both  a univariate  (ANOVA)  and  a mviltivariate 
(MANOVA)  analysis  of  variance  with  two  orthogonal  contrasts  for  comparing  means 
(control  vs.  the  other  two  groups  for  1 df  and  connected  rule  vs.  discrete 
rule  group  for  1 df).  Each  of  the  five  dependent  variables  was  considered 
simultaneously  in  a MANOVA  to  correct  for  Type  I error.  ANOVA  results 
on  single  dependent  variables  were  then  interpreted  if  the  exact  F-ratlos 
from  the  MANOVA  contrasts  warranted  further  consideration.  A generalized 
analysis  of  variance  computer  program  which  adjusted  for  an  unbalanced  design 
was  used  (Bryce  & Carter,  1974). 


33 


K f s 1 1 1 t s 


Hypothesis  1 

I’er  lOrmonce  on  tost  items  for  the  connected  rule  consistency 
)'.r(Mi|>  will  he  hij’her  than  tliat  for  the  discrete  rule  consistency 
);roiip,  and  performance  for  both  groups  will  be  higher  than  the 
content  only  (low  consistency)  group. 

Means  and  standard  deviations  on  performance  scores  for  the  three  treat- 
ment groups  are  found  in  Table  3.  A univariate  analysis  of  variance  and  an 
orthogonal  comparison  of  means  (Control  vs.  Connected  and  Discr'^te,  1 df; 
Connected  vs.  Discrete  group,  1 df)  indicated  a significant  difference 
between  both  the  connected  rule  and  discrete  rule  consistency  groups  as 
compared  t<i  the  control  group:  1^  (2,92)  = A. 21,  < .05.  There  was  no 

significant  difference  betwec'*  the  connected  rule  and  discrete  rule  con- 
sistency groups. 


Table  3 

Means  and  Si.  'dard  Deviations  for 
Dependent  Variables  by  Treatment  Group 


DependeTi  1 
Var i ab 1 e 

Connected 
Rule  (N=33) 

Mean  S.D. 

Treatment  Group 
Discrete 
Rule  (.M=32) 

Mean  S.D. 

Control 

Mean 

CJ=30) 

S.D. 

Per  f ormaiice 

10. A2 

,5A 

10.56 

.5A 

9.10 

.56 

Affect  Toward 
Instruct  ion 

3.06 

.06 

3.38 

.07 

2.92 

.07 

Confidence  in 
Answers  to 
Test  Items 

3.79 

.23 

A. 19 

.23 

3.60 

.2A 

Time  on 
Posttest 

2A.39 

.31 

23. 8A 

.32 

23.9 

.33 

Hypothesis  2 

Affect  toward  instruction  will  be  higher  for  the  connected 
rule  consistency  group  than  for  the  discrete  rule  consistency 
group,  and  affect  for  both  groups  will  be  higher  than  for  the 
content  only  (low  consistency)  group. 


3A 


Means  and  standard  deviations  on  affect  toward  Instruction  after  the 
treatment  condition  are  found  in  Table  3.  A univariate  analysis  of 
variance  and  an  orthogonal  comparison  of  means  indicated  that  all  three 
groups  were  significantly  different  from  each  other  with  the  discrete  rule 
consistency  group  highest,  the  connected  rule  consistency  group  next  highest, 
and  content  only  group  lowest  in  affect:  _F  (2,92)  “ 11.21,  2.  .OS. 

Hypothesis  3 

Confidence  in  answers  to  test  items  will  be  higher  for  the 
connected  rule  consistency  group  than  for  the  discrete  rule 
consistency  group,  and  both  groups  will  be  higher  in  confidence 
than  the  content  only  (low  consistency)  group. 

Means  and  standard  deviations  for  confidence  in  answers  to  test  items 
are  found  in  Table  3.  A univariate  analysis  of  variance  and  an  orthogonal 
comparison  of  means  indicated  no  significant  difference  between  any  of  the 
three  treatment  groups:  ^ (2,92)  » 1.60,  p > .05. 

Hypothesis  4 

Time  to  complete  rule  using  posttest  items  will  be  longer 
for  the  connected  rule  consistency  group  than  for  the  discrete 
rule  consistency  group,  and  both  groups  will  take  longer  than 
the  content  only  group. 

Means  and  standard  deviations  for  time  to  complete  the  posttest  are  found 
in  Table  3.  A univariate  analysis  of  variance  and  an  orthogonal  comparison 
of  means  indicated  no  significant  difference  between  any  of  the  three  treat- 
ment groups;  £ (2,92)  - .91,  £ > .05. 

Discussion 


The  study  investigated  the  extent  of  the  need  for  test  items  to  be  con- 
sistent with  their  generalities  in  content  representation  and  in  task  behaviors 
on  both  a discrete  and  connected  rule  level.  The  results  Indicated  that 
students  who  learned  from  materials  that  were  consistent  only  on  the  content 
representation  level  had  significantly  lower  score*'  and  affect  than  did  students 
whose  instruction  also  was  consistent  with  the  test  item  on  the  task  behavior 
level.  Neither  confidence  nor  time  was  significantly  different  across  treat- 
ments. 

The  specific  constraints  of  the  study  may  have  obscured  greater  differences, 
especially  the  hypothesized  differences  between  the  discrete  rule  and  connected 
rule  consistency  groups.  The  time  we  could  arrange  demanded  that  we  run  the 
entire  study  (from  Introduction  to  instruction  to  practice  to  test)  in  one 
2-hour  sitting.  Thus,  the  students  in  the  connected  rule  consistency  group, 
who  bad  the  most  new  materials  to  learn,  had  very  little  time  to  encode  the 
rather  lengthy  algorithm.  Given  more  time,  we  could  have  explicitly  taught 
them  the  three  or  four  nwijor  steps  Involved  In  the  algorithm  before  presenting 
them  with  all  the  detail  and,  thus,  allow  for  easier  chunking  (Miller,  1956)  of 
the  materials.  As  it  was,  the  discrete  and  connected  rule  consistency  groups 
may  have  responded  more  in  a forward  processing  manner  (Mayer,  1975b)  in  which 


35 


r 


t lu'V  dill  .IS  wi*  1 1 .IS  l lii'v  (lid  h.isi-d  upon  l ho  cxpor  t .1 1 Ions  .iroii.sod,  not.  only 
bv  t Ilf  initi.il  stUoment  of  tin*  torminal  ludi.ivior  but  .also  by  the  on-poinR 
pr.ictii'c.  r.imi  1 i.ir  i t y also  possibly  impacted  upon  the  results.  The  students 
h.ul  worked  with  m.iteri.ils  s imi  lar  to  th.it  provided  to  the  discrete  rule  con- 
sistency ’.roup,  while  the  algorithmic  .approach  was  not  a tool  familiar  to 
the  course.  A study  that  .illows  adequate  time  for  encoding  of  the  materials, 
preter.iblv  run  with  sever, il  meetings  of  the  groups,  should  be  made. 

Despite  the  constraints  of  the  study,  the  presence  of  test-generality 
consistency  beyond  a simple  content-only  level  clearly  resulted  in  better 
test  perlorm.ince  and  affi*ct.  Teachers  and  developers  would  do  well  to  give 
|)r.utice  in  the  specific  behaviors  required  by  a terminal  t.ask. 


36 


I 


STUDY  3:  VALIDATION  OF  THE  INSTRUCTIONAL  STR.\TEGY 

DIAGNOSTIC  PROFILE  IN  PHYSICS  100^ 

Overview  and  Hypotheses 

The  present  study  assumed  that  already  designed  and  ind ividual ized 
materials  on  concept  level  tasks  could  be  further  upgraded  by  an  example- 
practice-feedback  sequence  for  each  generality,  in  accordance  with  the 
hypotheses  stated  in  Merrill  and  Wood  (1975),  The  ISDP  also  supports  the 
generality  accepted  principle  that  test  Items  for  classification  or  rule- 
using tasks  should  consist  of  unencountered  instances.  Much  instruction 
ignores  this  dictum.  Thus,  this  study  analyzed  the  test  question  type  for 
the  materials  used  in  order  to  compare  student  performance  on  both  encountered 
and  unencountered  instance  items  and  ascertain  differences  in  student  per- 
formance when  the  study  materials  are  upgraded. 

The  hypotheses  were  based  on  the  assumption  that  an  increase  in  the  degree 
to  which  tests  and  instruction  follows  the  principles  prescribed  in  the  ISDP 
results  li'  higher  scores  on  tests  with  unencountered  instance  items  and  com- 
parable scores  on  tests  with  previously  encountered  instance  items. 

Methods 


■ 


Subject  Matter  Content 

The  subject  matter  consisted  of  six  units  (comprising  the  second 
quarter  of  the  course)  for  an  introductory  physics  course  at  Brigham  Young 
University.  There  were  several  reasons  for  selecting  this  subject  matter: 

1.  The  course  met  the  criteria  that  it  be  conceptually  based,  a 
quality  the  designers  and  instructors  of  the  course  desired. 

2.  The  course,  and  especially  the  student  study  guide,  was  already 
carefully  designed  and  yet  showed  deficiency  in  one  or  more  of  the  areas 
measured  by  the  ISDP.  Too  often  researchers  and  theoreticians  have  been 
accused  of  shooting  down  straw  men  as  they  compare  materials  they  had  developed 
In  an  hypothesli:ed  better  way  against  haphazardly  presented  "undesigned"  lessons. 

3.  The  materials  i;overed  a fairly  broad  range  of  topics.  It  is  often 
simpler  to  create  differences  in  a brief  one-shot  segment  of  material.  Under 
such  a condition,  the  novelty  of  the  approach — and  its  brevity — go  hand-in-hand 
to  generate  unusually  strong  attention  to  the  task.  To  get  at  real  differences, 
it  seems  necessary  to  have  materials  used  over  time. 

A.  There  is  a teal  and  substantial  challenge  to  show  differences  in 
the  less  neat  and  varied  world  of  the  on-going  class  rather  than  In  the  isolated 
laboratory  setting  (Glaser  £■  Resnlck,  1972).  Our  Job  is  to  show  that  the 
efforts  which  go  Into  materials  designed  according  to  stated  hypotheses  result 
In  real-life  differences. 


^Study  conducted  by  M.  D.  Merrill,  R.  V.  Schmidt,  and  R.  F.  Norton. 


37 


The  course  was  an  Introductory  course  that  serves  as  one  of 
several  options  to  fulfill  a general  education  requirement.  Thus,  the 
students  represented  a broad  spectrum  of  college  undergraduates  with  a 
<livers1tv  of  Interests  and  skills. 

h.  I'ata  on  students  enrolled  in  the  course  indicate  that,  during 
the  i'all  S<>mester  of  1975,  two-thirds  of  the  students  either  withdrew 
unofficially  from  the  course  or  received  a grade  of  incomplete.  Student- 
pacing problems  obviously  contributed  to  these  results,  but  problems  no  doubt 
existed  with  the  tests  and  instruction  as  well. 

The  material  selected  covered  the  following  topics: 

1.  Motion  and  Forces. 

?.  Forces  in  Fluids  at  Rest. 

"1.  Pressure  in  Moving  Fluids. 

4.  Conservation  of  F.nergy. 

5.  Kinetic  Theory  of  Matter. 

b.  I.aw  of  Increasing  Kntropy. 

Subjects 

The  subjects  were  43  students  from  two  sections  of  a summer  session 
of  the  above-mentioned  course,  which  fulfilled  part  of  the  general  education 
requirement  in  the  physical  sciences  at  the  university.  Other  subjects, 
representing  repeating  students  and  students  who  were  not  present  during  the 
initial  phase  of  the  study,  were  too  few  in  number  within  their  groups  to 
analyze  meaningfully. 

Treatments 


The  two  treatments  were:  (1)  the  regularly  constituted  class  study 

guide  and  (2)  a study  guide  whose  generalities  were  reinforced  with  example- 
practice-feedback  segments  according  to  ISDP  principles.  Moreover,  eight 
iinencountered  instance  item  questions  were  added  to  the  regular  seven-item 
test,  which  consisted  entirely  of  encountered  instance  or  generality  items. 
Fach  qtiestlon  tiad  several  parts  to  it,  and  there  were  four  versions  of  the 
test . 


Procedure 


A preliminary  study  of  the  nature  of  tl:e  test  question  and  the  cor- 
responding student  performance  (see  the  appendix)  was  run  in  order  to  ascertain 
whlcii  tests  could  be  upgraded  through  eliminating  previously  encountered  in- 
stance items  and  adding  unencountered  Instance  items. ^ This  evaluation  also 
allowed  selection  of  a unit  on  which  students  showed  problems  in  test  per- 
formance. r'ollowlng  this,  both  the  tests  and  material  were  upgraded  according 
to  the  principles  of  the  TSDP. 


description  of  this  study,  which  was  conducted  by  M.  D.  Merrill, 
R.  F.  Norton,  and  R.  V.  Schmidt,  is  provided  in  the  appendix. 


38 


Students  attending  class  the  first  week  were  randomly  assigned  to 
the  treatments.  They  were  requested  not  to  study  with  or  share  their 
materials  witli  anyone  whose  materials  did  not  match  theirs.  The  visual 
difference  in  materials  was  Immediately  apparent.  In  this  on-going  situa- 
tion, studying  together  had  to  be  allowed.  The  randomization  should  have 
taken  care  of  any  students  who  might  have  collaborated  using  different 
materials.  This  was  checked  later  through  a questionnaire:  two  students 

indicated  that  they  had  looked  briefly  at  or  studied  with  the  treatment 
materials  they  were  not  assigned. 

Students  could  take  the  15-item  short  answer  essay  test  at  a testing 
center  at  their  own  convenience.  The  experimenters  picked  up  the  test  from 
the  regular  graders  on  a dally  basis,  regarded  them  blindly,  and  returned 
them  the  next  day  to  the  testing  center  for  distribution,  keeping  copies  of 
each  exam  for  further  reference. 

Students  not  in  attendance  the  first  week  (N  = 6)  and  students  retaking 
the  course  (N  - 7)  also  took  the  test,  but  their  numbers  were  insufficient  to 
allow  an  analysis  of  their  performances. 

The  amount  of  time  required  for  taking  the  test  was  also  recorded,  and 
an  affective  questionnaire  was  administered  after  the  completion  of  this  phase 
of  the  course  to  see  if  there  were  any  general  differences  between  the  two 
groups. 

Design 

The  design  was  a post-test-only  design  with  subjects  nested  in  materials 
but  crossed  with  item  type.  Two  levels  of  the  main  effect  ("regular"  and 
"upgraded"  materials)  and  four  dependent  variables  (scores  on  encountered 
Instance  items,  scores  on  unencountered  Instance  items,  time  on  test,  and 
affect)  were  considered.  A two-way  analysis  of  variance  design  provided  the 
statistical  model  for  a univariable  analysis  of  variance  to  test  subject  per- 
formance. Rummage,  a generalized  analysis  of  variance  computer  program  to 
handle  an  unbalanced  design  and  adjust  for  other  effects  was  used  in  the 
analysis.  A t-test  was  used  to  analyze  time  data,  as  not  all  tests  carried 
this  Information. 

Results 


Hypothesis  1 

Tests  requiring  classification  or  rule-using  behaviors  for 
unencountered  Instance  items  result  in  lower  scores  than  when 
the  items  consist  of  previously  encountered  Instances. 

Means  and  standard  deviations  on  performance  scores  for  the  two  treatment 
groups  are  found  in  Table  4.  A univariate  analysis  of  variance  Indicated  a 
significant  difference  between  previously  encountered  instance  items  and 
unencountered  Instance  items  in  the  hypothesized  direction:  F (1,41)  ■ 5.5, 

p < .05. 


39 


Talile  4 


Mi'.m  (;orr«_’(  t and  Standard 

Deviations  l>y  i'reatmpnt  Croii[) 


Treatment  Croup 


Upgraded  Regular 

DepiMidi'nt  (M  = 2d)  (a'  = 20) 

Variabli’  Mean  S.D.  Mean  S.I). 


I’erformance  on  Encountered 

Instance  Itc'ms  7b.  I 02.5  74.5  02.  D 

lT*rformance  on  Unencount  ereci 

Instance  Items  69.7  02.5  68.6  02.9 


tlypothesis  2 

Linenoountered  instance  items  requiring  classification  on 
rule-using  beliavior  result  in  liigher  scores  when  the  degree 
t('  which  the  instruction  follows  ISDi’  principles  is  increased 
over  instruction  which  does  not  generally  follow  ISDl’  principles. 

Means  and  standard  deviations  on  performance  scores  for  the  two  treat- 
ment roups  are  found  in  Table  4.  A univariate  analysis  of  variance  in- 
dicated no  significant  difference  between  tlie  two  groups:  (1,41)  = .29, 

£ .05. 

Hypotliesis  3 

i'ime  to  complete  a rul(--using  or  classification  test  is 
(greater  when  the  diq>ree  to  which  instruction  follows  the  ISDT 
is  increas(>d  over  instruction  which  does  not  generally  follow 
ISDI’  principles. 

As  tin*  testing,  place  and  time  was  out  of  our  hands,  time  data  was  made 
available  for  only  17  of  the  subjects.  A t-test  run  on  the  available  data 
(Mean  of  37,  S.D.  of  36  for  the  Upgraded  group  and  Mean  of  71,  S.D.  of  24 
for  the  Regular  g,rou[>)  Indicated  no  significance:  ^(17)  = 1.116,  £ > .05. 

Discussion 


The  study  .attempted  to  ascertain  if  one  could  improve  student  performance 
on  a physics  test  by  uj'g.rading,  his  syllabus — mentioned  earlier  as  the  major 
te.ichlng  device — by  adhering  to  ISDI’  principles.  Because  we  intervened  in  an 
on-g,olng,  cl.ass,  this  ha<l  to  be  attempted  without  controls  on  the  teacher's 
lectures,  videotaped  helps,  or  the  text.  Though  the  differences  noted  were 


I 

I 


40 


In  the  hypothesized  direction  (see  Table  5),  they  were  not  significant.  The 
large  standard  deviations  indicate  that  we  had  not  captured  a substantial 
source  of  variability,  probably  due  to  course  materials  and  information 
beyond  the  syllabus.  Contrary  to  indications  on  previous  course  participa- 
tion, gleaned  from  several  former  physics  students  and  an  Instructor  of  the 
course,  a survey  we  ran  indicated  that  all  the  students  who  answered  the 
questionnaire  (N  “ 28)  attended  virtually  all  the  class  lectures  and  used 
the  text  for  each  unit  of  material. 


Table  5 

Mean  Percentage  of  Items  Answered  Correctly  for 
Students  with  Upgraded  and  Regular  Materials 


Upgraded  Materials 

Regular  Materials 

1st  Try 

2nd  Try 

1st  Try 

2nd  Try 

Encountered  Items 

76 

79 

76 

77 

Unencountered  Items 

69 

74 

66 

65 

a' 

f- 


Moreover,  the  "regular"  materials  were,  as  mentioned,  already  rather 
carefully  developed.  Although  they  did  not  support  each  stated  generality 
directly  and  consistently  with  examples  and  practice,  both  examples  and 
practice  were  available  to  the  student  who  hunted  for  them.  Thus,  since  we 
only  upgraded  the  generalities  present  in  the  original  syllabus  (as  a promise 
not  to  change  the  course  for  one  group  of  students),  we  probably  were  too 
optimistic  in  the  results  we  thought  it  would  create.  That  upgrading  from 
regular-class,  "non-developed"  materials  does  create  highly  significant  dif- 
ferences has  recently  been  demonstrated  in  a study  comparing  the  results  of 
students  taught  by  ISDP  and  "regular"  methods  in  nutrition  classes  (Richards, 
Richards,  & Merrill,  in  press). 

Our  dependent  measure  also  had  constraints  placed  upon  it  which  rendered 
it  less  sensitive  than  it  could  be.  The  Instructor  felt  that,  to  keep  the 
initial  contract  with  his  students,  we  had  to  keep  the  original  seven  questions 
on  which  the  students  would  be  graded.  In  order  to  obscure  which  questions 
these  were,  we  had  to  write  ours  in  the  same  essay  format.  Also,  we  were 
allowed  only  to  double  the  length  of  the  test,  so  we  could  not  go  beyond 
eight  additional  items.  This  was  not  sufficient  to  test  all  generalities  at 
least  twice,  especially  since  we  were  bound  to  the  essay  form.  Since  greatest 
majority  of  student  test  items  consisted  of  encountered  instance  items,  and 
our  questions  consisted  of  unencountered  instance  items,  this  distinction  was 
easy  to  break  out. 

Though  time  was  not  a significant  effect,  it  would  be,  perhaps,  given  a 
sampling  of  all  the  students. 


41 


The  !i,iMi|)U“  in  Lite*  a I t ec  t ivi!  quest  ionnaire  was  too  snmjl  to  use  for 
(Jrawiu)’,  any  v.ilid  cone  1 us  ions . Tliere  was  generally  mixed  response  from 
Ix'th  groups,  with  comments,  when  made,  indicating  some  dissatisfaction 
with  t lie  length  of  the  upgraded  materials  over  what  they  were  used  to  hut 
a greater  security  in  tlie  subsequent  test  answers  and  a desire  to  go  to 
the  svllahiis  tor  answers  ratlier  than  moving  from  the  syllabus  to  the  text 
as  some  previous  students  indicated  they  did. 

i'liis  research  helped  establish  several  guidelines  for  further  inter- 
vention studies  of  this  type.  First,  the  experimenters  should  have  control 
over  a 1 1 tlie  instruction,  including  lecture  material.  They  cannot  assume 
that  general  nonpar t ic Ipat ion  at  lectures  in  the  past  will  be  the  case  in 
the  present.  Secondly,  the  experimenters  must  have  full  control  of  the  test 
and  testing  situation.  This  will  allow  a satisfactory  and  sensitive  measure 
of  student  performance  on  the  generalities  taught  as  well  as  make  possible 
complete  data  on  time  and  affect.  Following  these  guidelines,  we  can  then 
perhaps  test  the  power  of  the  ISDP  against  materials  developed  at  a level 
similar  to  that  of  the  physics  materials  and  to  test  ISDP  developed  materials 
over  time  within  the  framework  of  an  on-going  class.  The  rationale  for 
selecting  this  type  and  amount  of  subject  matter  content,  as  discussed  in 
the  Metliods  section,  is  important  to  consider  in  conducting  studies  on  the 
effect  of  instructional  materials  on  student  performance. 


42 


CONCLUSIONS 


The  research  reviewed  indicate  that  the  propositions  underlying  the  ISDP 
Profile  seem  to  be  valid.  While  the  data  reported  in  this  document  is  some- 
what inconclusive  and  not  sufficient  to  make  unqualified  statements  it  is, 
nevertheless,  positive.  When  considered  with  other  data  on  the  ISDP  (e.g., 
Wood,  Richards,  & Merrill,  1976),  it  seems  reasonable  to  assume  that,  when 
the  ISDP  Profile  is  used  as  a guide  to  analyze  and  modify  existing  instruction, 
the  resulting  performance  of  students  is  likely  to  be  more  effective.  This 
is  especially  likely  when  the  tests  as  well  as  the  main  line  Instruction  can 
be  modified.  It  is  less  likely  when  only  the  student  syllabus  is  modified. 

The  ISDP  does  seem  to  have  considerable  potential  as  an  instructional  evalua- 
tion and  development  tool. 


43 


RECOMMENDATIONS 


1.  The  ISDP,  as  presented  in  the  ISDP  training  manual,  is  recommended 
for  use  by  Navy  Instructional  developers  and  evaluators.  However,  it  should 
be  considered  an  experimental  tool  and  should  be  used  only  by  experienced 
Instructional  technologists  who  can  appropriately  adapt  its  use  to  various 
settings  and  circumstances. 

2.  The  present  effort  lias  increased  our  understanding  of  the  ISDP  and 
has  considerably  Increased  our  ability  to  diagnose  and  prescribe  modifica- 
tions in  existing  instructional  materials  which  result  in  improved  student 
performance.  However,  our  understanding  of  the  instructional  diagnosis  and 
prescription  process  has  merely  scratched  the  surface.  Because  of  its 
apparent  usefulness,  it  is  recommended  that  ISDP  validation  and  development 
efforts  continue  so  that  this  instrument  can  become  an  easy  to  use  tool  for 
all  instructional  development  and  evaluation  personnel. 


'7.-  - *v- 

J>PECEDmj  PAGE 


BLANK-NOT  rilMED 


REFERENCES 


Anderson,  R.  C.  How  to  construct  achievement  tests  to  assess  comprehension. 
Review  of  Educational  Research,  1972,  145-170. 

Anderson,  R.  C.,  Goldburg,  S.  M.,  & Hldde,  J.  L.  Meaningful  processing  of 
sentences.  Journal  of  Educational  Psychology,  1971,  395-399. 

Bloom,  B.  S.  Time  and  learning.  American  Psychologist,  1974,  29^,  682-688. 

Bryce,  G.  R. , & Carter,  M.  W.  MAD:  The  analysis  of  variance  in  unbalanced 

designs — a software  package.  Presented  at  COMPSTAT  1974:  Proceedings  in 

Computational  Statistics.  Also  in  Bruckmann,  G. , Ferschl,  F.,  & 

Schmetterer,  L.  (Eds.),  Physlca  Verlagwlen,  Werzburg,  Germany,  1974. 

Christensen,  H.  B.  Introductory  statistics:  A simplified  approach. 

Provo,  UT:  Brigham  Young  University  Press,  1974. 

Ehrenprels,  W. , & Scandura,  J.  M.  The  algorithmic  approach  to  curriculum 

construction:  A field  test  in  mathematics.  Journal  of  Educational  Psychology. 

1974,  ^(4),  491-498. 

Gagn^,  R.  M.  The  conditions  of  learning.  New  York:  Holt,  Rinehart,  & 

Winston,  1970. 

Glaser,  R.  Components  of  a psychology  of  Instruction:  Toward  a science  of 

design.  Review  of  Educational  Research,  1976,  4^,  1-24. 

Glaser,  R.,  & Resnick,  L.  B.  Instructional  psychology.  Annual  Review  of 
Psychology,  1972,  23,  207-276. 

Gropper,  G.  L.  The  design  of  stimulus  materials  in  response-oriented  programs. 
Audio  Visual  Communications  Review,  1970,  1^(2),  129-159. 

Gropper,  G.  L.  Diagnosis  and  revision  in  the  development  of  instructional 

materials.  Englewood  Cliffs,  NJ : Educational  Technology  Publications,  1976. 

House,  E.  R.  School  evaluation:  The  politics  and  process.  Journal  of 

Educational  Psychology.  1971,  395-399. 

Landa,  L N.  Algorithmization  in  learning  and  Instruction.  Englewood  Cliffs,  NJ : 
J§  Educational  Technology  Publications,  1974. 

Markel , S.  M.  They  teach  concepts,  don't  they?  Educational  Researchers. 

1975,  3-9. 

Mayer,  R.  E.  Information  processing  variables  in  learning  to  solve  problems. 
Review  of  Educational  Research.  1975,  525-541.  (a) 

Mayer,  R.  E.  Forward  transfer  of  different  reading  strategies  evoked  by  test- 
like  events  in  mathematics  text.  Journal  of  Educational  Psychology,  1975, 

67,  No.  2,  165-169.  (b) 


t 


' OC'trr* T3T  A knr 


f’orhner,  K.  t'eliavioral  nnalysiH  and  Instructional  sequencing. 

In  r.  . I.anpe  (Kd.),  Prop, rammed  Instruction.  rhicago:  NSSK,  1 967. 

Merrill,  I'.  D.  , Olsen,  ,1.  R.  , 6 Coldeway,  N.  A.  Researcli  support  for  tlie 
Instructional  Strateg.y  Diagnostic  Profile  (Tech.  Rep.).  t'ourseware,  Inc. 

I March  1976. 

I'errlll,  M.  D. , 6 Wood,  N.  D.  Instructional  strategies:  A preliminary 

t axonomy . C.olumhus,  Oil:  Ohio  State  University,  1 974.  (KRIC  Document 

Reproduc t Ion* Service  No.  SE  018  771) 

Merrill,  M.  I).,  6 Wood,  N.  I).  The  Instructional  strateg.y  diagnostic  profile.  i 

Provo,  I'T ; Courseware,  Inc.,  1976. 

Miller,  A.  The  magical  number  seven  plus  or  minus  two:  Some  limits  on 

our  capacit'’  for  processing  info’mation.  PsychoIogic.il  Review,  1956,  63 , 

81-97. 

Minskv,  1'.  A framework  for  representing  knowledge.  Boston:  Miissachusetts 

Institute  of  Technology,  Artificial  Intelligence  l.ahoratory,  1974. 

Richards,  S.,  Richards,  R.  E.,  6 Merrill,  M.  I).  Improved  test  performance 
via  strategy  Intervention  In  a nutrition  course.  San  Diego:  Courseware, 

Inc . , in  press . 

Scandura,  .1.  M.  Role  of  rules  in  behavior:  Toward  an  operational  definition 
of  what  friile)  is  learned.  Psychological  Review,  1970,  77  (6) , 

516-533. 

Scandura,  .1.  M.  On  higher  order  rules.  Educat iona 1 Psychologist , 1973,  10(3), 

159-160.  ; 

Scandura,  .1.  M.  Role  of  hlglier  order  rules  in  problem  solving.  Journal  ot 

Exper imental  Psychology,  1974,  1 02 (6) , 984-991.  ^ 

Scandura,  J.  M. , 6 Durnin,  J.  II.  Extra-scope  transfer  in  learning  mathematical 
strategies.  Journal  of  Educational  Psychology,  1968,  350-354. 

Schmidt,  R.  V.,  W<^od , N.  I).,  & Merrill,  M.  D.  Test  .and  generality  consistency 
in  .1  classification  task  in  validation  of  the  Instructional  Strategy  Diagnostic 
Profile  (ISDP):  Empirical  studies  (Tech,  Rep.).  San  Diego:  Courseware,  Inc.,  ‘ 

in  press. 

Shoemaker,  D.  N.  Toward  a Framework  for  achievement  testing.  Review  ol 
Educ.'it  ton.i  1 Researcli,  1975,  4_5,  127-147. 

Stake,  R.  E.  Measuring  what  learners  learn.  in  E.  R.  House  (Ed.),  School 

Eva  lii.it  Ion:  The  Politics  and  the  Process.  Berkely,  CA:  McCutchan  Publishing 

fkirpora  1 1 on , 1 973. 


48 


Wood,  N.  D.,  Richards,  R.  E.,  & Merrill,  fl.  D,  Prediction  of  student 
performance  on  rule-using  tasks  from  the  diagnosis  of  instructional 
strategies.  Provo,  UT:  Brigham  Young  University  research  paper,  1976. 

Yum,  K.  W.  An  experimental  test  of  the  law  of  assimilation.  Journal  of 
Experimental  Psychology.  1931,  1^,  68-82. 


49 


APPENDIX 


EVALUATION  OF  TEST  ITEM  TYPE  AND 
STUDENT  PERFORMANCE  IN  PHYSICS  100 


A-0 


Introduction 


The  Physics  Department  at  lirigham  Young  University  requested  that  the 
Division  of  Instructional  Research,  Development  and  Ilval nation  make  recom- 
mendations for  Improving  the  basic  physics  course  at  the  University.  During 
the  Fall  Semester  of  1975,  two-thirds  of  the  students  either  withdrew  un- 
officially from  the  course  or  received  a grade  of  incomplete.  Student  paring 
problems  obviously  contributed  to  these  results,  but  problems  no  doubt  existed 
with  the  tests  and  instruction  as  well. 

Previous  evaluation  of  the  Physics  100  course  .".c  BYU  has  demonstrated 
that  the  course,  intended  to  teach  conceptual  matter  ror.fent,  contained 
material  that  was  often  deficient  in  the  rule-exampl e-prar t ice  proposition 
of  the  Instructional  Strategy  Diagnostic  Profile. 

This  study  examined  the  conceptual  correspondence  of  the  test  items  to 
the  test  prescriptions  of  the  ISDP.  Student  performance  was  compared  on  the 
various  types  of  items  that  were  Included  on  the  tests. 

Method 


The  Physics  Department  provided  the  pool  of  test  items  from  which  all  of 
the  tests  administered  to  the  students  were  constructed.  F.ach  of  the  test 
items  was  classified  into  one  of  five  categories  according  to  the  type  of 
content  they  measured: 

1.  Unencountered  inqulsltory  Instances  (leg) — for  questions  in  which 
the  student  was  asked  to  apply  a rule  (given  or  not  given)  to  a particular 
Instance  not  previously  encountered. 

2.  Partially  encountered  Inquisitory  Instance — for  the  same  type  of 
questions  as  on  number  one  above,  but  where  the  particular  Instance  had  been 
only  partially  encountered  before. 

3.  Encountered  inqulsltory  instance — for  the  same  type  of  questions  as 
in  numbers  one  and  two  above,  but  where  the  particular  instance  had  been 
previously  encountered  in  the  instructional  materials. 

4.  Inqulsltory  generality  (IG) — for  questions  in  which  the  student  was 
asked  to  remember  or  recognize  a rule  statement  or  concept  definition. 

5.  Miscellaneous  category  (M) — for  questions  where  the  student  was  asked: 
(a)  to  cite  evidence  (data  or  logic)  for  a given  proposition,  (b)  to  give  or 
recognize  superordinate,  coordinate,  or  subordinate  relationships  among  or 
between  propositions  or  concepts,  or  (c)  to  remember  a given  constant  or  some 
specific  piece  of  data,  a fact,  etc.,  which  is  an  identity. 

Most  of  the  test  items  contained  more  than  one  category  of  question  within 
the  item.  If  any  leg  questions  occurred  within  an  item,  the  whole  item  was 
classified  leg.  If  an  IG  question  was  combined  with  a M question,  the  whole 
item  was  classified  M.  Interrater  reliability  was  strengthened  by  having 
both  raters  rate  the  same  items  separately  and  then  compare  the  results.  The 
few  disagreements  were  discussed  until  consensus  was  reached  on  all  items. 


> 

1 


i 


An  itom  was  classified  as  encountered  if  the  answers  to  two-thirds  or 
more  of  tlie  questions  constituting  tlie  item  were  found  anywliere  in  the  text, 
tin*  syllabus,  or  the  television  lectures.  An  item  was  classified  as  unen- 
conntered  if  one-third  or  fewer  of  the  questions  constituting  the  item  were 
encountered  in  the  above  mentioned  sources.  Items  falling  in  between  these 
two  cutoff  points  or  items  whore  a similar  hut  not  identical  instance  was 
encountered  in  the  lesson  materials  were  classified  as  partially  encountered. 

Independent  Variables 

An  analysis  of  variance  was  run,  using  three  Independent  variables: 

(I)  tlie  three  examinations  over  three  different  areas  of  subiect  matter,  (2) 
the  seven  test  items  used  on  each  test,  and  (3)  the  five  categories  indicating 
the  content  type  of  each  item. 

The  first  examination  covered  the  first  six  chapters  of  the  text  and 
aimed  at  a conceptual  understanding  of  Newton's  first  two  laws  of  motion.  The 
second  examination  covered  chapters  seven  through  ten  of  the  text  and  aimed  at 
a conceptual  understanding  of  the  laws  of  force  and  motion,  conservation  of 
energy,  the  kinetic  theory  of  matter,  and  the  law  of  entropy.  The  third 
examination  covered  chapters  11  through  14  and  aimed  at  a conceptual  under- 
standing of  the  properties  of  waves,  electricity,  and  magnetism. 

The  test  item  number  was  included  as  an  independent  variable  because 
it  served  as  an  index  of  the  difficulty  level  of  the  various  items.  The  sixth 
and  sevi-nth  items  on  each  test  (A  level  items)  were  designed  by  the  developers 
of  the  test  to  be  more  difficult  than  the  fourth  and  fifth  items  (B  level 
items),  and  these  in  turn  were  designed  to  be  more  difficult  than  items  one, 
two,  and  three  (C  level  items).  An  inclusion  of  this  variable  in  the  analysis 
of  variance  enabled  an  empirical  evaluation  of  the  preassessed  difficulty 
levels  of  the  items. 


The  five  content  type  categories  were  included  to  assess  which  types 
of  questions  were  being  answered  most  effectively  by  the  students. 


Dependent  Variables 


The  Physics  Department  had  already  gathered  data  on  the  number  of 
students  that  had  missed  each  item  in  the  pool  of  test  items.  The  number  of 
times  each  item  was  used  on  a test  was  calculable  from  knowing  the  total  number 
of  tests  given  and  the  procedure  used  to  generate  the  various  tests  that  were 
used.  From  the  above  information,  the  percentage  of  students  answering  each 
test  item  correctly  could  be  determined.  This  percentage  was  used  as  the 
dependent  variable  in  the  analysis  of  variance  reported  in  the  results  section 
of  this  paper. 


The  final  data  analysis  design  was  a 3 
clearly  be  understood  by  looking  at  the  design 


x 7 X 5 matrix  that  can 
diagram  in  Figure  A-1 . 


most 


A-2 


Figure  A-1.  Design  for  evaluation  of  test  item 
type  and  student  performance. 


Ro.s  u 1 t s 


Table  A-1  shows  the  mean  percentages  of  students  answering  items  cor- 
rectly on  each  of  the  three  examinations,  the  mean  percentages  of  students 
answering  questions  correctly  under  each  test  item  number,  and  the  means 
formed  by  the  examination  and  item  number  interactions.  An  analysis  of 
variance  showed  that  the  differences  among  examinations  means  were  signi- 
ficant (£  = 16.366,  £ < .01)  as  were  differences  among  item  number  means 
(r  = 23.747,  2.  ^ -O^)- 


Table  A-] 

Examination  vs.  Item  Number  Matrix  Mean  Percentage 
of  Students  Answering  Items  Correctly 


I tern 
ilumber 

1 

Examinat ion 
2 

3 

Means 

1 

87.945 

83.141 

82.998 

84.816 

2 

86.447 

80.196 

75.807 

81.250 

3 

72.444 

82.376 

72.177 

75.418 

4 

79.284 

71.663 

61.232 

71.590 

5 

73.763 

63.785 

67.017 

68.745 

6 

72.300 

60.378 

46.421 

58,372 

; 

58.679 

60.142 

56.509 

58.182 

Me.i  n s 

76.021 

73.291 

65.811 

l.ible  A-2  shows  the  mean  percentages  of  students  answering  each  type  of 
test  <iuest ion  correctly,  the  mean  percentages  of  students  answering  questions 
correctly  under  each  test  item  number,  and  the  means  formed  by  the  content 
type  and  item  number  interactions. 

I'abie  A-3  shows  the  mean  percentages  of  students  answering  items  correctly 
1 on  <*.ach  of  the  three  examinations,  the  mean  percentages  of  students  answering 

e.aclt  type  of  test  question  correctly,  and  the  means  formed  by  examination  and 
content  type  interactions.  An  analysis  of  variance  showed  that  the  differences 
among  the  cond^jnt-type  means  (66.077,  72.505,  73.388,  73.024,  and  74,159)  were 
also  significant  (F^  = 2.649,  < .05). 

The  nature  of  these  differences  was  analyzed  using  prediction  coefficients 
and  Is  reported  in  the  discussion  section  which  follows.  Table  A-4  gives  the 
percentages  of  items  used  from  each  content  type  on  each  examination. 


A-4 


Table  A-2 


Content  Type  vs.  Item  Number  Matrix  Mean  Percentage 
of  Students  Answering  Items  Correctly 


Question  Type 

Item 

Number 

Unencoun- 
tered  leg 

Partially 
Encoun- 
tered leg 

Encoun- 
tered leg 

IG 

M 

Means 

1 

82.045 

86.137 

87.556 

80.146 

87.273 

84.816 

2 

77.138 

83.243 

78.217 

87.611 

79.333 

81.250 

3 

59.209 

79.713 

74.180 

78.796 

81.220 

75.418 

4 

61.622 

70.549 

77.684 

72.476 

72.666 

71.590 

5 

64.186 

92.000 

70.486 

63.813 

0 

68.745 

6 

58.785 

45.253 

66.425 

41.333 

76.250 

58.372 

7 

64.148 

53.574 

59.166 

65.864 

48.610 

58.182 

Means 

66.077 

72.505 

73.388 

73.024 

74.159 

Table  A-3 

Examination  vs.  Content  Type  Mean  Percentage  of 
Answering  Correctly  Items  of  Each  Question 

Students 

Type 

Examination 

Question 

Type 

1 

2 

3 

Means 

Unencountered  leg 

71.876 

64.144 

61.568 

66.077 

Partially  Encountered 

leg  78.058 

73.504 

59.735 

72.505 

Encountered  leg 

78.084 

75.238 

66.842 

73.388 

IG 

78.200 

73.470 

68.204 

73.024 

M 

73.544 

78.495 

70.743 

74.159 

Means 

76.021 

73.291 

65.811 

A-5 


Table  A-4 


The  Percentages  of  Items  Used  from  Kach  Question 
Type  on  Each  Examination 


Question  Type 

1 

Examinat ion 
2 

3 

Unencountered  leg 

31% 

20% 

11% 

Partially  Encountered  leg 

19% 

7% 

5% 

Encountered  leg 

25% 

35% 

46% 

IG 

10% 

34% 

29% 

M 

15% 

4% 

9% 

100% 

100% 

100% 

Discuss  ion 


The  difference  among  examinations  indicates  that  significantly  fewer 
students  responded  c(3rrectly  to  the  items  on  the  third  exam  (see  Table  A-]). 

It  is  possible  tliat  the  items  were  more  difficult  or  that,  because  of  the 
end  of  the.  semester,  fewer  students  retook  the  third  exam  than  retook  the 
first  and  second  exams.  Each  time  a student  retook  an  examination,  even 
thougl)  tl>e  items  were  different  than  on  the  previous  exam,  he  was  likely  to 
do  l^etter  than  lie  did  the  time  before  because  of  more  study  in  the  area  where 
he  was  deficient.  This  would  mean  that  the  average  percentage  of  students 
answering  items  correctly  was  artificially  elevated  for  both  the  first  and 
second  exams — more  so  for  the  first  than  for  the  second.  Regardless  of  the 
question  type,  items  on  the  third  exam  were  missed  more  often  than  the  cor- 
responding type  of  items  on  the  other  two  exams  (see  Table  A-3). 

The  sixth  and  seventh  test  questions  on  each  exam  were  consistently  more 
difficult  than  all  of  the  other  questions  (see  Table  A-1).  However,  question 
six  on  exam  one  was  not  significantly  more  difficult  than  questions  three  and 
five.  The  overall  means  for  the  seven  question  numbers  indicate  that  questions 
four  and  five  fell  in  the  middle  range  of  difficulty  as  intended,  but  this  was 
not  consistent  when  the  three  exams  were  considered  separately. 

Althoiig.h  questions  six  and  seven  were  more  difficult  than  the  others,  they 
Were  not  measuring  a higher  level  of  conceptual  understanding,  as  might  be 
hoped,  but,  rather  more  obscure  details  encountered  in  the  test,  syllabus,  or 
videotapes.  It  might  bo  more  meaningful  to  use  previously  unencountered 
questions  as  A and  B level  items.  This  would  tend  to  award  Rs  and  As  on  the 
basis  of  a better  conceptual  understanding  of  the  material  rather  than  on  the 
basis  of  ability  to  remember  more  obscure  detail. 


A-6 


Regardless  of  the  type  of  question  involved.  Items  six  and  seven  were 
consistently  missed  more  frequently  than  the  other  Item  (see  Table  A-2). 

This  Is  likely  a reflection  of  the  tendency  for  A-level  Items  to  deal  with 
obscure  details.  It  Is  also  Interesting  that  on  the  unencountered  Inquisltory 
Instance  questions  (unencountered  leg)  for  the  B and  C levels,  each  of  the 
mean  percentages  falls  below  the  grand  mean  for  its  respective  question  number. 
This  is  as  we  would  expect  for  more  difficult  questions.  Yet  for  the  A level 
questions,  the  unencountered  leg  questions  have  mean  percentages  equal  to  or 
higher  than  the  grand  means  for  their  respective  question  numbers.  This  may 
mean  that  the  students  have  acquired  a set  response  to  unencountered  unobscure 
Items  versus  unencountered  obscure  Items.  For  example,  they  may  be  skipping 
the  unencountered  unobscure  Items  without  spending  much  time  on  them  because 
they  realize  that  they  have  never  seen  them  before.  At  the  same  time,  be- 
cause the  A level  Items  Involve  more  obscure  material,  they  are  spending  more 
time  to  think  and  are  coming  up  with  more  right  answers  on  their  own. 

The  unencountered  Instance  questions  are  significantly  more  difficult 
(£^  < .05)  than  all  other  test  question  types  as  we  might  expect  If  they  were 
measuring  understanding  at  conceptual  level  rather  than  at  just  a memory  level 
(see  Table  A-3).  The  partially  encountered  Instance  questions,  the  encountered 
Instance  questions,  the  Inquisltory  generality  questions,  and  the  miscellaneous 
questions  all  seem  to  be  at  the  same  level  of  difficulty  for  the  students.  How- 
ever, the  partially  encountered  Instance  items  on  test  three  are  slightly 
(though  not  significantly)  more  difficult  than  the  unencountered  Instance  items. 
The  partially  encountered  Items  In  tests  one  and  two,  but  more  similar  to  the 
unencountered  items  in  test  three. 

The  percentages  of  unencountered  Instance  questions  on  the  various  exams 
are  also  very  interesting.  Because  they  are  the  most  difficult  Items,  one 
might  have  expected  a positive  correlation  between  the  percentages  of  such 
Items  and  the  performance  by  the  students  on  the  tests.  However,  there  was  a 
negative  correlation  (see  Table  A-4).  Although  test  three  was  the  most 
difficult  for  the  students,  it  had  only  11  percent  of  the  most  difficult 
question  type.  This  might  mean  that  the  subject  matter  tested  In  test 
three  was  Inherently  more  difficult  or  that  the  Instruction  in  this  area 
was  weaker. 

The  present  study  will  be  expanded  to  see  If  student  performance  on  unen- 
countered Instance  questions  could  be  Improved  by  following  the  prlncples  of 
effective  Instruction  recommended  In  the  Instructional  Strategy  Diagnostic 
Profile. 


A-7 


DISTRIBUTION  LIST 


Chief  of  Naval  Operations  (OP-987P10),  (OP-991B) 

Chief  of  Naval  Education  and  Training  (OOA) 

Chief  of  Naval  Education  and  Training  Support 

Chief  of  Naval  Education  and  Training  Support  (OlA) , (N-5) 

Chief  of  Naval  Technical  Training  (Code  016) 

Chief  of  Naval  Material  (NMAT  035) 

Chief  of  Naval  Research  (Code  450)  (4) 

Chief  of  Naval  Personnel  (pers-lOc) 

Chief  of  Information  (01-2252) 

Commanding  Officer,  Naval  Aerospace  Medical  Institute  (Library  Code  12)  (2) 

Commanding  Officer,  Naval  Education  and  Training  Program  Development  Center 
Commanding  Officer,  Naval  Development  and  Training  Center  (Code  0120) 

Officer  in  Charge,  Naval  Education  and  Training  Information  Systems  Activity 
Director,  Training  Analysis  and  Evaluation  Group  (TAEG) 

Director,  Defense  Activity  for  Non-Traditlonal  Education  Support 
Personnel  Research  Division,  Air  Force  Human  Resources  Laboratory  (AFSC) 

Lackland  Air  Force  Base 

Occupational  and  Manpower  Research  Division,  Air  Force  Human  Resources 
Laboratory  (AFSC) , Lackland  Air  Force  Base 
Technical  Library,  Air  Force  Human  Resources  Laboratory,  Lackland  Air  Force  Base 
Technical  Training  Division,  Air  Force  Human  Resources  Laboratory, 

Lowry  Air  Force  Base 

Program  Manager,  Life  Science  Directorate,  Air  Force  Office  of  Scientific 
Research  (AFSC) 

Army  Research  Institute  for  the  Behavioral  and  Social  Sciences 
Coast  Guard  Headquarters  (G-P-1/62) 

Military  Assistant  for  Training  and  Personnel  Technology,  ADDR&E,  0AD(E&LS) 
Director  for  Acquisition  Planning  0ASD(I&L) 

Defense  Documentation  Center  (12) 


if,-. PfBcsun?  


